ProgrammingSQL Analyst

How to properly implement multi-column sorting in SQL queries for business reports, and what are the nuances with NULL values, Collation, and performance?

Pass interviews with Hintsage AI assistant

Answer.

Multi-column sorting is an essential tool when generating reports that require ranking or grouping data by multiple columns (for example, first by date, then by name, and then by amount). Traditionally, sorting was strictly applied to one field, which does not solve the task of complex data analysis.

The problem often lies in incorrect sorting order (for example, NULL values appearing in unexpected places), collation errors (different language rules for sorting string values), and performance dips on large datasets.

Solution: Use multi-column ORDER BY, consider the specifics of COLLATION, and explicitly define the order of NULLS for determinism of the result, as well as check the indexing of columns for large tables.

Example code:

SELECT * FROM sales ORDER BY region COLLATE "ru_RU", date DESC NULLS LAST, total_amount DESC;

Key features:

  • Allows sorting across a full set of columns and data types.
  • Explicitly manages the position of NULL through NULLS FIRST/LAST.
  • COLLATE for string columns can be explicitly set (support depends on the DBMS).

Trick Questions.

If COLLATE is not specified, will the sorting always be the same on different servers?

No! Default COLLATE depends on the database and/or server settings, and the results of string sorting (especially with Cyrillic special characters) can vary between installations.

Where do NULLs go when sorting DESC and ASC?

The SQL standard does not define this clearly. In some DBMS, NULLs go to the top when ASC, in others - to the bottom. Explicitly indicate NULLS FIRST or NULLS LAST for transparent behavior.

ORDER BY price DESC NULLS LAST

Does multi-column ORDER BY affect index usage?

Yes, but only if the order, types, and purity of the column values match the existing composite index. If the order is mixed (ASC/DESC) or calculated fields are used, the index may not be utilized.

Typical mistakes and anti-patterns

  • Failing to specify COLLATE leads to different results on different server language settings.
  • Implicit sorting by NULL in reports results in unexpected blank rows appearing in various parts of the selection.
  • Huge ORDER BY clauses on non-indexable/calculated fields "consume" resources without benefit - first index the fields you are sorting by.

Real-life example

Negative case

The manager created a report with ORDER BY by last name without COLLATE. When transferring the report between different servers, Russian and Latin surnames swapped places, and NULLs were sometimes at the beginning and sometimes at the end.

Pros:

  • Minimal code. Cons:
  • Unpredictable behavior and maintenance complexities.

Positive case

In the report, COLLATE and NULL order were explicitly set via NULLS FIRST/LAST, and a composite index was added on the sorting fields. Behavior ceased to depend on the environment, and the query speed significantly increased.

Pros:

  • Predictability, performance, ease of maintenance. Cons:
  • Not all DBMS support non-standard COLLATE and NULLS FIRST/LAST – cross-DBMS testing is required.