ProgrammingData Engineer

What nuances exist when using list and scalar assignments for arrays and hashes in Perl? How can common mistakes be avoided when manipulating data structures?

Pass interviews with Hintsage AI assistant

Answer

In Perl, the result of assignment (and, consequently, working with arrays and hashes) depends on the context.

  • Scalar assignment of an array returns the size of the array.
  • List assignment returns all elements.
my @arr = (10, 20, 30); my $count = @arr; # $count == 3 my ($first, $second) = @arr; # $first == 10, $second == 20

For hashes:

  • Assigning a hash to a list creates an array of key/value pairs.
  • Scalar assignment of a hash returns a false representation ("size of the hash") — but not just the number of pairs!
my %h = (a=>1, b=>2, c=>3); my $size = %h; # $size == 3 in modern versions of Perl, but this was not the case earlier!

Beware of assigning a reference to an array instead of copying its contents!

Trick Question

How does assigning an array to a reference differ from copying the contents of an array?

Answer:

my @a = (1,2,3); my $ref = \@a; # $ref is a reference to the array, changes through $ref are visible in @a my @b = @a; # @b is a new array, changes to @b do not affect @a # Compare: push @$ref, 4; # @a is now (1,2,3,4) push @b, 5; # @a remains (1,2,3,4); @b is (1,2,3,5)

Examples of Real Mistakes Due to Ignorance of Topic Nuances


Story 1

In a project, an array was passed to a subroutine via a reference, not realizing it was indeed a reference: the function modified it at the call site. Bugs appeared — the data structure in the calling code was already "corrupted". Expected a copy, got an alias.


Story 2

An engineer expected that scalar assignment of %h would return the actual number of pairs. It turned out — in older versions of Perl, this behavior was different: it returned the number of slots/buckets instead of the length! As a result, it sometimes returned not 3, but another number, which broke the statistics.


Story 3

In a large ETL system, arrays were copied through references, and then suddenly overwrote each other's data because everyone was working with the same array, not independent copies. Diagnosing the error took several days.