Answer.

History of the question

As programs became more complex and demand increased, Perl developers found the need to analyze bottlenecks and optimize scripts. Built-in profilers such as Devel::NYTProf, Devel::DProf, and various manual timing methods through Benchmark were created for this purpose.

The problem

The main difficulty is that Perl is known for its dynamism and flexibility, which creates additional overhead (on-the-fly code interpretation, frequent type conversions, low-level memory management, autovivification of structures). It is not obvious which part of the code becomes the slowest, as the bottleneck often lies where the developer does not look. A flawed approach is premature optimization without actual profiling.

The solution

Use a profiler, generate reports, work with statistics. NYTProf provides the most detailed information and supports graphical analysis. For some localized measurements, Benchmark::Timer or time are used. The code is optimized based on results — for example, excessive logic is rewritten, unnecessary array copying is eliminated, and XS wrappers are implemented for critical locations.

Example code:

# profiling with Devel::NYTProf
perl -d:NYTProf myscript.pl
nytprofhtml  # HTML report with details

Key features:

The dynamism of Perl affects the results — often the bottleneck lies at the level of data structures and language magic
NYTProf visualizes execution excellently, including external calls
Optimization is iterative: "profile — fix — profile again"

Trick questions.

Will the profiler always show the exact cause of slowdowns in each section?

No. The profiler may distort the picture somewhere, especially if rarely called functions are analyzed, or when working with external resources (DB, network).

Can it be said that XS binding always provides maximum performance improvement?

Not always. XS speeds up only compute-intensive fragments, but if the bottleneck is I/O or data structure, the gain will be minimal.

Should the slowest functions always be rewritten in C or XS after the first analysis?

No. Often it's more appropriate to change the algorithm or the way data is stored (autovivification vs preallocate, array vs hash) than to immediately resort to low-level optimization.

Typical mistakes and anti-patterns

Profiling only "based on feelings"
Optimizing before profiling (prematurely)
Ignoring the characteristics of Perl's data structures (for instance, choosing an array when a hash is needed)
Rewriting simple code in C without a visible reason

Real-life example

Negative case

A developer randomly speeds up functions, rewrites them in XS, but does not see professional growth in performance since the main bottleneck was in multiple file reads.

Pros:

Gained experience in C and XS

Cons:

Time losses, difficulty in maintenance, ineffectiveness

Positive case

Conducting profiling through NYTProf, identifying real slow fragments, optimizing only them, while rewriting the algorithm in a more efficient way elsewhere. Participants' relationships in the code showed where there were unnecessary copies of arrays.