ProgrammingBackend Developer

How to implement conditional aggregation in SQL (for example, calculating sums by different statuses in a single query) and what potential pitfalls exist here?

Pass interviews with Hintsage AI assistant

Answer.

Conditional aggregation is used when you need to aggregate values (e.g., sum or count) based on different conditions within a single query. Historically, to achieve such goals, SQL developers had to write multiple subqueries or perform joins and groupings separately for each category, resulting in less readable and less efficient code.

The problem lay in the difficulty of simultaneously calculating multiple aggregates with different conditions — each individual aggregate usually requires filtering, and simply applying SUM() or COUNT() does not account for the necessary condition within the aggregate function.

Solution — using the CASE construct within an aggregate function, which allows "separating" aggregates based on different conditions on the fly, without resorting to multiple joins:

Example code:

SELECT department, SUM(CASE WHEN status = 'approved' THEN amount ELSE 0 END) AS approved_sum, SUM(CASE WHEN status = 'pending' THEN amount ELSE 0 END) AS pending_sum, COUNT(CASE WHEN status = 'rejected' THEN 1 END) AS rejected_count FROM payments GROUP BY department;

Key features:

  • Allows aggregating based on multiple conditions at once in a single query.
  • Works in all SQL dialects (exact syntax support may vary).
  • Can be used not only with SUM but also with COUNT, AVG, etc.

Tricky questions.

Can I do without the CASE construct for conditional aggregation?

An incorrect answer would be to use WHERE directly in the query along with the aggregate function. In fact, WHERE filters rows before aggregation, not within each aggregate column.

Example code (incorrect way):

SELECT COUNT(*) FROM payments WHERE status = 'approved'; SELECT COUNT(*) FROM payments WHERE status = 'pending';

These queries cannot be concatenated into a single result line based on different conditions without CASE.

What happens if you use NULL within the conditional expression?

If CASE does not return an alternative value, it will default to NULL, and the aggregate function will ignore NULL.

Example code:

SUM(CASE WHEN status = 'approved' THEN amount END) -- If the status is not 'approved', then NULL, and the row is ignored in SUM

Can I use IF instead of CASE?

In some SQL dialects (e.g., MySQL) this is possible, but such code becomes non-portable between different databases. In universal queries, always use CASE.

Common mistakes and anti-patterns

  • Using WHERE instead of nested logic inside CASE leads to the necessity to perform multiple single queries, which degrades performance.
  • Forgetting to set ELSE 0 in a CASE can yield incorrect counts/sums due to missing NULLs.
  • Copy-pasting CASE without needed checks leads to logical errors and incorrect calculations.

Real-life example

Negative case

An analyst tried to calculate sums for each status by writing several separate subqueries. The external report became complex, and adding new statuses required rewriting the code each time.

Pros:

  • Easy to implement for a single metric. Cons:
  • Difficult to scale, low performance, and code duplication.

Positive case

A developer used CASE in one general query, creating a universal report with the ability to accommodate new statuses by simply editing one query.

Pros:

  • High readability, ease of scaling, performance. Cons:
  • Requires knowledge of CASE syntax and its nuances in different DBMS.