ProgrammingSQL Analyst

How to implement effective aggregation with layered filtering using window functions and groupings in SQL? What are the differences between the approaches, and what mistakes do developers often make?

Pass interviews with Hintsage AI assistant

Answer.

To effectively build aggregations with "level" filtering (for example, first filter, then calculate sums by groups, then by the entire dataset), window functions (OVER()) and nested GROUP BY are used.

  • Grouping through GROUP BY aggregates only by selected fields; to exclude the influence of "extraneous" rows, a preliminary filter (WHERE) is necessary.
  • Window functions allow calculating aggregates over specific data segments, applying filters to the resulting selection (for example, only among the rows of the same group).

Example: Let's find the maximum order amount by each manager, but only among orders with the status 'paid', and then display the manager's name with the absolute maximum among all.

WITH PaidOrders AS ( SELECT ManagerID, SUM(OrderAmount) AS TotalPaid FROM Orders WHERE Status = 'paid' GROUP BY ManagerID ), WithMax AS ( SELECT *, MAX(TotalPaid) OVER() AS MaxTotalPaid FROM PaidOrders ) SELECT ManagerID, TotalPaid FROM WithMax WHERE TotalPaid = MaxTotalPaid;

This approach (CTE + window functions) allows for multi-level filtering and aggregation.

Tricky question.

Trick: "What is the difference between applying the WHERE filter before grouping (GROUP BY) and using HAVING after? What does this often lead to in reports?"

Answer: WHERE removes rows before grouping, thus providing a strict input set. HAVING filters aggregated groups — therefore, it may temporarily "retain" extraneous rows if the filter is logically inconsistent. An incorrect filter location often leads to errors in final aggregates or incorrect report results.

-- Getting the sum only for 'paid', through WHERE SELECT ManagerID, SUM(OrderAmount) FROM Orders WHERE Status = 'paid' GROUP BY ManagerID; -- Or calculating sums for all, then trimming with HAVING SELECT ManagerID, SUM(OrderAmount) FROM Orders GROUP BY ManagerID HAVING SUM(OrderAmount) > 1000;

Examples of real mistakes due to ignorance of the topic nuances.


Story

Project: Sales reports, verification audit.

Mistake: The developer applied HAVING Status='paid' instead of WHERE, leading to aggregates incorrectly including unpaid orders, resulting in inaccurate annual KPIs for staff.



Story

Project: Banking analytics.

Mistake: An attempt to apply a window function to a complex aggregation without PARTITION BY led to aggregates being calculated over the entire table instead of the group. The department's budget was incorrectly calculated — it had to be restored manually.



Story

Project: Online store, order statistics.

Mistake: The nesting of window functions in a subquery was overlooked during optimization — causing the server to process data multiple times, the query became 20 times slower than just using double GROUP BY.