Window functions allow you to perform calculations across a "window" of rows without grouping them into separate rows, which is convenient for reporting and analytics.
Example:
SELECT name, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS num, RANK() OVER (ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank FROM employees;
Table:
| name | salary |
|---|---|
| Vasya | 10000 |
| Petya | 10000 |
| Masha | 9000 |
Result:
| name | salary | num | rank | dense_rank |
|---|---|---|---|---|
| Vasya | 10000 | 1 | 1 | 1 |
| Petya | 10000 | 2 | 1 | 1 |
| Masha | 9000 | 3 | 3 | 2 |
Pitfalls:
If PARTITION BY is not specified in the window function, how will the ROW_NUMBER() row numbering be performed?
Answer: The entire data set will be considered as a single partition. That is, the numbering will be continuous across all rows, without regard to any groupings.
Example:
SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) FROM employees; -- All employees will receive a unique number starting from 1 without splitting by departments
Story #1
In the BI report, PARTITION BY was forgotten by department. All employees of the company received consecutive continuous numbering, while the task was to identify the best within each department. As a result, an incorrect TOP-N of employees by departments was produced.
Story #2
The developer chose RANK() instead of ROW_NUMBER() to determine the "best" of the group — but with the same indicators, identical numbers were assigned, resulting in implicit duplicates of leaders in the analytics.
Story #3
When using DENSE_RANK(), it was not taken into account that it prohibits gaps in ranks, which distorted the report on the number of "unique" significant positions when analyzing sales. Business logic control revealed an error in rank distribution.