Answer.

Historically, aggregation and grouping tasks in SQL often arose for generating reports and analytics. Already in the relational DBMS of the 80s, basic aggregate functions (SUM, COUNT, AVG) appeared, but with large volumes of data, the classic GROUP BY slowed down. The scalability problem arose: queries with tens of millions of records and many groups locked tables and slowed down performance.

The issue is that with an inefficient approach, the SQL server spends a lot of resources on sorting, intermediate tables, and reading from disk. It becomes particularly challenging when grouping is done by multiple columns or with a dynamic set of aggregated data.

The solution lies in properly constructing indexes on the grouping columns, using partitioning, "semi-aggregation," and optimizing the query structure. For business analytics tasks, structured Common Table Expressions (CTEs), materialized views, and window functions are often used.

Example code:

WITH PreAgg AS (
    SELECT customer_id, region, SUM(amount) AS total_amount
    FROM sales
    WHERE sale_date >= '2024-01-01'
    GROUP BY customer_id, region
)
SELECT region, COUNT(DISTINCT customer_id) AS customers, SUM(total_amount) AS region_amount
FROM PreAgg
GROUP BY region
ORDER BY region_amount DESC;

Key features:

Indexes on grouping columns radically speed up GROUP BY
Storing pre-aggregated (summary) data reduces the load
Materialized VIEW simplifies and speeds up complex reports

Trick Questions.

Does the performance of GROUP BY depend on the order of columns in SELECT?

No, the order of columns in SELECT does not affect speed; what matters critically is which columns are being grouped and whether there is an index on them.

Is it mandatory to specify an aggregate function for each field in SELECT when using GROUP BY?

Not necessarily; if a field is included in GROUP BY, it can be selected without aggregation. If a field is not part of the grouping, it must be aggregated.

SELECT department, MIN(salary) FROM employees GROUP BY department;

Can one GROUP BY be nested within another for multi-level aggregation?

Yes, nested CTEs or subqueries allow for "multi-tier" aggregations with intermediate results.

WITH Step1 AS (
  SELECT customer, SUM(amount) AS cust_sum FROM orders GROUP BY customer
)
SELECT COUNT(*) FROM Step1 WHERE cust_sum > 10000;

Typical Mistakes and Anti-patterns

GROUP BY on non-indexed columns or on a large number of fields
Careless use of aggregate functions (e.g., NULL values)
Aggregation without filtering (unnecessary data is not excluded)

Real-life Example

Negative Case

An analyst builds a report with multiple GROUP BYs on a table with 200 million records without indexes and without sampling, the entire office "hangs" at 9 AM. Execution takes 40 minutes.

Pros:

No unnecessary intermediate design required

Cons:

Catastrophic load on the server, slowdowns, and all other queries stall

Positive Case

An engineer uses CTE for preliminary filtering, proper indexes on necessary fields, and splits aggregation into several stages. The report is generated in 5 seconds.