ProgrammingSQL Analyst

How to implement effective filtering and aggregation of time series data in SQL, if limited to standard SQL-92 level?

Pass interviews with Hintsage AI assistant

Answer

The question of processing time series in relational databases arose at the intersection of analytics and classical programming with SQL. In SQL-92, there are no special window functions, so one has to get creative with subqueries to calculate dynamic metrics (moving sums, averages, etc.) and conditions based on time.

Problem — the lack of standard tools for aggregation over a moving window, finding previous/next values by time, and effective grouping by arbitrary calendar intervals (for example, calculating weekly/monthly metrics).

Solution:

Using only standard tools, one uses correlated subqueries for each row or groups by a computed criterion (for example, month, week):

Example code:

-- Example of grouping by weeks without window functions SELECT YEAR(event_date) AS year, WEEK(event_date) AS week, SUM(value) AS total FROM timeseries GROUP BY YEAR(event_date), WEEK(event_date) ORDER BY year, week; -- Correlated subquery to find the previous record SELECT t1.id, t1.event_date, t1.value, ( SELECT t2.value FROM timeseries t2 WHERE t2.event_date < t1.event_date ORDER BY t2.event_date DESC LIMIT 1 ) as prev_value FROM timeseries t1;

Key features:

  • It is necessary to explicitly group and aggregate by time functions (YEAR, MONTH, WEEK, etc.).
  • To obtain moving calculations, one has to use correlated subqueries or temporary tables.
  • Performance suffers on large data volumes — a subquery for each row quickly slows down execution.

Trick questions.

Does WEEK(event_date) always unambiguously define the calendar week for any dates?

No — different DBMS (and even parameters of the same DBMS) define the first week of the year differently (for example, ISO 8601 vs. American system), which can lead to different results when aggregating. One needs to explicitly specify the mode of operation for the function or use YEARWEEK.

SELECT YEARWEEK(event_date, 1) -- 1: ISO week starts on Monday FROM timeseries;

Does the correlated subquery automatically remove duplicates when searching for the previous value?

No, the correlated subquery does not filter duplicates by default. If there are multiple events on the same date in the table, the subquery will return the first one in sorting but ignore the others.

Can one aggregate by dates through GROUP BY without considering time?

Yes, but one needs to explicitly discard the time part, for example, using DATE(event_date) or TRUNC(event_date) in different DBMS:

SELECT DATE(event_datetime), COUNT(*) FROM events GROUP BY DATE(event_datetime)

Typical mistakes and anti-patterns

  • Using YEAR/MONTH/WEEK without considering locale and calendar standards for reporting periods
  • Creating overly complex correlated subqueries for moving windows — explosive growth in execution time on large volumes
  • Working with time functions without considering the time zone

Real-life examples

Negative case

The team calculated weekly analytics using WEEK(date) without setting parameters to ISO-8601. As a result, reports for the first week of January "got lost" — some events were attributed to December of the previous year according to the American logic. The analytics did not match up.

Pros:

  • Quickly implemented period decomposition

Cons:

  • Data is incorrect, reports do not align with business logic

Positive case

A specialist implemented YEARWEEK(date, 1) and a calendar table, significantly improving the consistency of reports between countries and business units.

Pros:

  • The business always receives the correct week and month numbers

Cons:

  • Slightly more complex to maintain — calendar tables need to be updated.