Answer

Cumulative totals and running sums are traditionally solved in SQL using window functions (e.g., SUM() OVER(ORDER BY ...)), however, in earlier or simplified versions of DBMS, only subqueries and groupings are available. Historically, database architects sought workarounds until the SQL:2003 standard with support for window functions appeared.

Problem — in the absence of window functions, for each row, it is necessary to explicitly calculate the sum of all previous values, which leads to O(N^2) nested queries for sufficiently large datasets if tricks are not applied.

Solution:

Typically, correlated subqueries or temporary tables with value updating are used:

Sample code:

-- Cumulative sum using a correlated subquery
SELECT t1.id, t1.amount,
   (
     SELECT SUM(t2.amount)
     FROM transactions t2
     WHERE t2.id <= t1.id
   ) AS running_total
FROM transactions t1
ORDER BY t1.id;

-- Through a temporary table with manual value updates
CREATE TEMPORARY TABLE temp_running (id INT, amount INT, running_total INT);
-- Iterate over rows using external code (e.g., pl/pgsql), sequentially adding sums

Key features:

Method works only if there is a unique sorting criterion (id, date of production)
Correlated subquery scales poorly — exponential growth of execution time
For large volumes of data, it is more logical to use ETL with aggregation outside of SQL or through procedures

Tricky questions.

Does an ORDER BY guarantee sorting in a correlated subquery?

No — the subquery does not necessarily affect the result itself. The sorting of the final selection is always set externally in the main query: the result depends solely on filtering by WHERE.

Can the calculation of the cumulative sum be parallelized in this approach?

No — the sequence is very important, especially when calculations depend on previous rows, which makes simple parallelization impossible in regular SQL.

Why is the correlated subquery so slow with a large number of rows?

Because for each row, the sum over the set of previous rows is recalculated. This leads to O(N^2) operations. With a sample of 100 thousand rows, this can already take minutes or even hours.

Typical mistakes and anti-patterns

Incorrect filtering by id instead of the actual date — sums "jump" on gaps in id
Trying to perform summation without ordering data
Using such an approach for huge tables when ETL or partitioned processing is needed

Real-life example

Negative case

An analyst calculated daily cumulative revenue by date through a correlated subquery, while the table periodically had deleted ids (gaps). The final sum had sharp drops and depended not on the date but on the order of ids.

Pros:

Works for small datasets, does not require window functions

Cons:

Data is incorrect, does not calculate as expected
Difficult to maintain

Positive case

An engineer moved the cumulative sum processing to an ETL script (Python/pandas), then uploaded the final values to a separate table, synchronizing only new entries. The totals are always consistent by date, and the code works quickly and with millions of records.