ProgrammingPerl Data Processing Specialist

How do internal data structures work in Perl (arrays of arrays, hashes of hashes, and mixed types), and what pitfalls can arise when creating and using them?

Pass interviews with Hintsage AI assistant

Answer.

Arrays of arrays, hashes of hashes, and other complex data structures in Perl are built using references. This approach allows for the easy creation of hierarchical/branching structures but requires care when accessing, copying, and modifying them, as by default the reference is stored, not the content.

Background

Initially, Perl supported only flat arrays and hashes without nesting. Later, support for references was introduced, which allowed for any combinations: arrays of arrays, hashes of hashes, "tree", "graph" structures, etc.

Problem

Working with complex structures requires remembering that read, write, and copy operations work with references. Errors often occur due to confusion between an element and a reference to an element. This leads to numerous bugs, for example, changing data in one place affects the entire structure if the reference is used by multiple parts of the program simultaneously.

Solution

To create an array of arrays:

my @matrix; for my $i (0..2) { for my $j (0..2) { $matrix[$i][$j] = $i * $j; } } print $matrix[1][2]; # 2

For a hash of hashes:

my %data; $data{'user1'}{'name'} = 'Alex'; $data{'user1'}{'age'} = 20;

Mixed structures:

my %complex = ( 'list' => [1, 2, 3], 'map' => { foo => 'bar' }, );

Key Features:

  • Working with nested structures always happens through references, even if it's not obvious at first glance.
  • Simple assignment is not enough for deep copying.
  • Errors are often related to the fact that the type of data/structure is not immediately visible.

Tricky Questions.

What happens if you try to assign one array to another to copy the structure?

Such an assignment does not copy nested structures, but only copies the references to them (this is known as "shallow copy").

my @a = ([1,2], [3,4]); my @b = @a; $a[0][0] = 99; printf "$b[0][0] "; # Will print 99, as @b contains references to the same arrays as @a

What is the difference between accessing an element as $array[$i] vs $array->[$i]?

The first option works if we have an array, the second works if we have a scalar that references an array. For nested structures, the most common syntax is the arrow ($foo->[0]).

Why can't you just take a copy of a structure using dclone in standard Perl?

Because dclone is not included in the base Perl distribution. For deep copying of complex structures, the Storable module and the dclone function are used:

use Storable 'dclone'; my $deep_copy = dclone(\%complex);

Typical Errors and Anti-Patterns

  • Assigning a complex structure "as is" without using deep copying
  • Mistake in accessing an element without considering whether we have a reference (or not a reference)
  • Attempting to serialize a complex structure without accounting for nesting and references

Real-life Example

Negative Case

In a project, they copy an array of arrays by regular assignment (@copy = @org), and after a series of changes, they suddenly notice that the "original" data changed along with the copy.

Pros:

  • Fast
  • Simple syntax

Cons:

  • High likelihood of hidden bugs
  • Implicit changes in different parts of the program

Positive Case

They use the Storable module and the dclone function to copy arrays and hashes, clearly documenting this in the code and clearly distinguishing where a reference is and where it is not.

Pros:

  • Correct data duplication
  • Clear structure of the code
  • Fewer unpleasant surprises

Cons:

  • Need to remember about additional dependencies
  • Easy to forget the need for deep copying in new places