Answer.

Background

Reports on unique users are essential for analytics and statistics. However, in real data, duplicate accounts and NULL values (e.g., unspecified email) are often present, and various criteria must be taken into account (e.g., uniqueness by name, email, IP, and sometimes their combinations).

Problem

A typical mistake is to calculate COUNT(DISTINCT user_id) without considering that the relevant columns may contain NULLs or non-obvious duplicates (e.g., one person with different emails, or multiple rows with the same user_id but different statuses). Complex queries with GROUP BY can yield incorrect results if the uniqueness logic is not well thought out.

Solution

It is important to combine DISTINCT, GROUP BY, and NULL filtering. Sometimes it is necessary to prepare the data in a CTE or a subquery, grouping by the appropriate set of attributes.

Example code:

-- Counting unique users by email and IP, ignoring NULL
SELECT COUNT(*) AS unique_users
FROM (
    SELECT DISTINCT email, ip_address
    FROM users
    WHERE email IS NOT NULL AND ip_address IS NOT NULL
) u;

Key features:

DISTINCT does not consider rows with NULL in any of the specified columns
For cross-grouping, it is better to use GROUP BY on a combination of fields
When dealing with duplicates, data often needs to be "cleaned" beforehand

Tricky Questions.

Does COUNT(DISTINCT ...) consider NULL rows?

No: if at least one of the columns in the DISTINCT list has a NULL value, such a combination is considered unique (NULL is not equal to NULL according to SQL standards). It is usually more convenient to first remove NULLs using filtering.

Can NULL be compared to NULL through DISTINCT?

In SQL, each pair of NULL values is considered different, so each row with a NULL in any of the columns will be counted separately. Filtering through IS NOT NULL should be applied.

Does GROUP BY always give the same result as DISTINCT?

No: GROUP BY creates one row for each non-repeating combination of values, whereas DISTINCT simply removes duplicates. In some cases, the results are different, especially when aggregation is applied.

Common Mistakes and Anti-Patterns

Non-obvious filtering of NULL values
Implicit duplication of data before aggregation
Incorrect combination of DISTINCT with aggregation at different levels of hierarchy

Real-Life Example

Negative Case

A business analyst builds a report on unique clients through COUNT(DISTINCT user_id), but user_id may actually be NULL or duplicated (e.g., temporary accounts). The actual number of users turns out to be higher than the real one, leading to distorted metrics in the report.

Pros:

Quick report implementation

Cons:

Incorrect business decisions due to inaccurate metrics

Positive Case

An analyst cleans the data in advance, filters out NULLs and obvious duplicates in subqueries, and uses SET operations for complex criteria of uniqueness.