ProgrammingBackend Developer

How is regex implemented in Perl in the s/// (substitution) format: what are the differences between greedy and lazy patterns, how to properly handle multiline strings, and how to avoid unexpected effects?

Pass interviews with Hintsage AI assistant

Answer.

Perl is one of the languages where regular expressions are deeply integrated. The main substitution operator is s///, which allows you to search and replace string fragments based on a pattern. There are many subtle points in this construction, especially when working with greedy/lazy patterns, multiline processing, and substitution options.

Background

The s/// operator has been present in Perl since early versions, and Perl laid the foundation for the syntax of regular expressions, which was later adopted by other languages. Most nuances of pattern construction and modifiers (g, m, s, i, x, etc.) originated and developed in Perl.

Problem

In practice, many developers improperly use greedy quantifiers or get confused with modifiers (especially s and m), leading to unexpected results when replacing in multiline texts or large datasets. Errors occur when expecting one match in a string, but getting another, or when replacing only the first/last occurrences.

Solution

It is important to choose and configure patterns correctly and understand how modifiers work. Greedy patterns (e.g., .*) capture the maximum possible range, while lazy patterns (e.g., .*?) capture the minimally necessary.

Working with modifiers:

  • g — performs replacement for all matches
  • s — enables multiline string processing, where the dot (.) captures newline characters
  • m — changes the behavior of the anchors ^ and $

Example — replace tags <tag> ... </tag> with a space, one tag at a time (lazy):

my $text = 'a <tag>1</tag> <tag>2</tag> b'; $text =~ s/<tag>.*?<\/tag>//g; print $text; # a b

To process multiline strings:

my $data = "Line 1 Line 2 <tag> DATA </tag> End"; $data =~ s/<tag>.*?<\/tag>//gs; print $data;

Key features:

  • Greedy patterns capture the maximum range, lazy ones capture the minimum
  • The s modifier allows the dot (.) to capture newline characters
  • The g modifier affects the number of replacements made

Trick Questions.

What happens if you don't specify ? after a greedy . when processing multiple tags?*

The greedy quantifier will capture the maximum possible range, including intermediate tags, resulting in unexpected removal of everything between the first <tag> and the last </tag>:

my $txt = 'A <tag>1</tag> <tag>2</tag> B'; $txt =~ s/<tag>.*<\/tag>//g; print $txt; # A B

Here, the entire chunk between the first <tag> and the last </tag> is replaced.

What is the difference between the m modifier and the s modifier in Perl regex?

s — the dot (.) captures the newline character; m — changes the anchors ^ and $ to work within lines in multiline text. Their purposes are different, but they are often confused.

my $s = "abc def"; # /^def/ won't work without m print $s =~ /^def/m; # 1 (true)

How to process all occurrences of a pattern if you apply s/// only once?

Without the g modifier, only the first occurrence will be replaced. You need to add g for a global replacement:

my $s = "foo bar foo"; $s =~ s/foo/baz/g; # will replace both foo

Common Mistakes and Anti-Patterns

  • Using greedy patterns where lazy ones are needed, leading to capturing excess data
  • Missing the g modifier, causing only the first match to be replaced
  • Ignoring the s and m modifiers when working with multiline data

Real-Life Example

Negative Case

A developer writes a replacement for HTML tags like this:

$text =~ s/<tag>.*<\/tag>//g;

As a result, all tags along with their content between them are removed — not each one individually.

Pros:

  • Concise and understandable code
  • Quickly operates for a single match

Cons:

  • Incorrect results for multiple identical fragments
  • Integrity of the remaining structure is violated

Positive Case

Using a lazy pattern and correct modifiers:

$text =~ s/<tag>.*?<\/tag>//gs;

Pros:

  • Each block is replaced correctly
  • No excess captures

Cons:

  • Must know the syntax of lazy patterns and modifiers