Background:
Perl was originally created as a tool for efficient text stream processing, which is why the input/output (I/O) mechanisms are among the most refined in the core of the language. With the development of Unicode and the emergence of various I/O layers, the task of correctly selecting encodings and managing streams became crucial to avoid data loss or corruption.
Problem:
Incorrectly chosen encoding when reading/writing files leads to data distortion (especially with national characters), and errors in stream handling (e.g., unchecked success of file openings) often become sources of bugs and vulnerabilities.
Solution:
To open files, use open with the three-argument syntax, which is safer and more universal (avoids vulnerabilities with path and mode interpretation). For proper interaction with encodings, layers are applied (for example, "<:encoding(UTF-8)"). Always check for the success of opening/closing and explicitly set necessary modes of operation.
Example:
open my $fh, '<:encoding(UTF-8)', $filename or die "Cannot open $filename: $!"; while (my $line = <$fh>) { chomp $line; # removes newline print "$line "; } close $fh or warn "Cannot close $filename: $!";
Key features:
Can user input be passed directly to open without checking?
Answer: No! This leads to vulnerabilities (for example, executing shell commands with an unsafe alias) and errors in working with paths. Use explicit three-argument syntax.
What will happen if the encoding layer is not specified and the file is in UTF-8?
Answer: Perl will attempt to interpret bytes as latin1, which will lead to distorted characters when outputting/reading, especially if national alphabets are used.
Is it enough just to call close to ensure the file is written correctly?
Answer: No. After close, you need to check the return value. If a write error occurred, Perl will only report it via $! after an unsuccessful close. For example:
close $fh or die "Write failed: $!";
A log handler reads a file through open FILE, "file.txt", does not check success, processes data byte-by-byte — as a result, Cyrillic characters turn into gibberish, some lines are lost.
Pros:
Cons:
All file handling is done through three-argument open with encoding specification. All errors are handled and logged, resulting data is always correct for the locale.
Pros:
Cons: