It is often necessary to create queries where the filtering criteria are defined dynamically — the list of values is specified by the user of the interface, and missing values (NULL) should also be able to be included or excluded from the result. Such a query must work quickly on large volumes and be semantically correct from the SQL perspective.
Incorrectly combining the conditions IN, NOT IN, and checking for NULL leads to unexpected results because in SQL the expression "NULL IN (...)" always returns UNKNOWN, and "NOT IN" may lead to an empty result if at least one of the elements in the list is NULL. Performance on large data with poorly written conditions can drop sharply.
The correct combination looks like this:
SELECT * FROM sales WHERE region IN ('Moscow','Samara') OR region IS NULL;
SELECT * FROM sales WHERE (region NOT IN ('Moscow','Samara') OR region IS NULL);
SELECT * FROM sales WHERE region NOT IN ('Moscow','Samara') AND region IS NOT NULL;
Key features:
What happens if you check "WHERE field NOT IN ('a', NULL)"?
The result is always empty, as any row compared to NULL in the list returns UNKNOWN and filters everything out.
Code example:
SELECT * FROM test WHERE name NOT IN ('Ivan', NULL);
Will "IN (NULL,...)" return a value for rows with NULL?
No. The expression NULL IN (...) always returns UNKNOWN, not TRUE.
How to speed up filtering by a large list of values?
Use indexes and preferably apply JOIN with a temporary table if the list is large (hundreds/thousands of values) — this speeds up filtering compared to a long IN.
In an analytical report, "WHERE city NOT IN ('London', field_2, NULL)" was applied. The result — completely empty, no one could figure out the cause of lost rows for a long time.
Pros:
Cons:
In a CRM, a list of customer regions was selected and an additional button "Include empty values" was added. Filtering logic:
... WHERE city IN ('SPB','NNov') OR city IS NULL
Pros:
Cons: