ProgrammingBI/SQL analyst

How do window functions ROW_NUMBER(), RANK(), DENSE_RANK() work and how do they differ when programming reports in SQL? What pitfalls are there in their usage?

Pass interviews with Hintsage AI assistant

Answer

Window functions allow you to perform calculations across a "window" of rows without grouping them into separate rows, which is convenient for reporting and analytics.

  • ROW_NUMBER() — assigns a unique sequential number to rows within each partition of the window, sorting them by a specified criterion. It can return gaps in numbering when there are identical values in ORDER BY.
  • RANK() — assigns the same rank to rows with the same ORDER BY value, but skips the numbers of subsequent elements (there will be a gap).
  • DENSE_RANK() — also assigns the same rank to rows with the same value, but the ranks are consecutive, with no gaps.

Example:

SELECT name, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS num, RANK() OVER (ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank FROM employees;

Table:

namesalary
Vasya10000
Petya10000
Masha9000

Result:

namesalarynumrankdense_rank
Vasya10000111
Petya10000211
Masha9000332

Pitfalls:

  • Incorrectly chosen ORDER BY can lead to incorrect sorting.
  • (row_number) If a unique field is not chosen in ORDER BY — there is no guarantee of stable ordering.
  • Using without PARTITION BY when partitioning is necessary leads to incorrect numbering across the entire set of rows.

Trick question

If PARTITION BY is not specified in the window function, how will the ROW_NUMBER() row numbering be performed?

Answer: The entire data set will be considered as a single partition. That is, the numbering will be continuous across all rows, without regard to any groupings.

Example:

SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) FROM employees; -- All employees will receive a unique number starting from 1 without splitting by departments

Story #1

In the BI report, PARTITION BY was forgotten by department. All employees of the company received consecutive continuous numbering, while the task was to identify the best within each department. As a result, an incorrect TOP-N of employees by departments was produced.


Story #2

The developer chose RANK() instead of ROW_NUMBER() to determine the "best" of the group — but with the same indicators, identical numbers were assigned, resulting in implicit duplicates of leaders in the analytics.


Story #3

When using DENSE_RANK(), it was not taken into account that it prohibits gaps in ranks, which distorted the report on the number of "unique" significant positions when analyzing sales. Business logic control revealed an error in rank distribution.