Perl is one of the languages where regular expressions are deeply integrated. The main substitution operator is s///, which allows you to search and replace string fragments based on a pattern. There are many subtle points in this construction, especially when working with greedy/lazy patterns, multiline processing, and substitution options.
The s/// operator has been present in Perl since early versions, and Perl laid the foundation for the syntax of regular expressions, which was later adopted by other languages. Most nuances of pattern construction and modifiers (g, m, s, i, x, etc.) originated and developed in Perl.
In practice, many developers improperly use greedy quantifiers or get confused with modifiers (especially s and m), leading to unexpected results when replacing in multiline texts or large datasets. Errors occur when expecting one match in a string, but getting another, or when replacing only the first/last occurrences.
It is important to choose and configure patterns correctly and understand how modifiers work. Greedy patterns (e.g., .*) capture the maximum possible range, while lazy patterns (e.g., .*?) capture the minimally necessary.
Working with modifiers:
g — performs replacement for all matchess — enables multiline string processing, where the dot (.) captures newline charactersm — changes the behavior of the anchors ^ and $Example — replace tags <tag> ... </tag> with a space, one tag at a time (lazy):
my $text = 'a <tag>1</tag> <tag>2</tag> b'; $text =~ s/<tag>.*?<\/tag>//g; print $text; # a b
To process multiline strings:
my $data = "Line 1 Line 2 <tag> DATA </tag> End"; $data =~ s/<tag>.*?<\/tag>//gs; print $data;
Key features:
s modifier allows the dot (.) to capture newline charactersg modifier affects the number of replacements madeWhat happens if you don't specify ? after a greedy . when processing multiple tags?*
The greedy quantifier will capture the maximum possible range, including intermediate tags, resulting in unexpected removal of everything between the first <tag> and the last </tag>:
my $txt = 'A <tag>1</tag> <tag>2</tag> B'; $txt =~ s/<tag>.*<\/tag>//g; print $txt; # A B
Here, the entire chunk between the first <tag> and the last </tag> is replaced.
What is the difference between the m modifier and the s modifier in Perl regex?
s — the dot (.) captures the newline character; m — changes the anchors ^ and $ to work within lines in multiline text. Their purposes are different, but they are often confused.
my $s = "abc def"; # /^def/ won't work without m print $s =~ /^def/m; # 1 (true)
How to process all occurrences of a pattern if you apply s/// only once?
Without the g modifier, only the first occurrence will be replaced. You need to add g for a global replacement:
my $s = "foo bar foo"; $s =~ s/foo/baz/g; # will replace both foo
A developer writes a replacement for HTML tags like this:
$text =~ s/<tag>.*<\/tag>//g;
As a result, all tags along with their content between them are removed — not each one individually.
Pros:
Cons:
Using a lazy pattern and correct modifiers:
$text =~ s/<tag>.*?<\/tag>//gs;
Pros:
Cons: