Controlling the uniqueness of business keys (not only by ID, but also by "natural" fields) has long been a requirement in corporate applications. Often, such keys are complex (consisting of multiple columns) and allow NULL values. The standard SQL methods — UNIQUE CONSTRAINT or unique indexes — have limitations: according to the SQL standard, multiple rows that have at least one component of the UNIQUE key equal to NULL are considered unique and do not violate the constraint.
Business logic may require that a set of columns (including those allowing NULL) be collectively unique, whereas the behavior of the SQL standard (NULL ≠ NULL) breaks this scenario. For example, we have a table with a pair of columns (passport_number, passport_series), and at least one of them can be NULL, but if the values match, we must prohibit duplicates.
In most popular DBMS options, the solution is to implement uniqueness checks through a trigger or use partial indexes with a condition, or use functions that allow comparing NULL as equal values (for example, ISNULL or COALESCE). Here’s an example in PostgreSQL using a unique index based on an expression:
CREATE UNIQUE INDEX idx_passport_unique ON persons ( COALESCE(passport_number, ''), COALESCE(passport_series, '') );
At the application logic level, it is often necessary to duplicate checks to avoid missing duplicates at the insertion stage.
Key Features:
Can we be sure that a unique index guarantees that duplicates cannot be inserted if there are nullable columns among the columns?
No, we cannot. In SQL, the behavior with NULL is peculiar: rows with the same set of values, where at least one of them is NULL, are considered different by the index and allow their simultaneous storage.
Can different DBMS implement NULL uniqueness differently?
Yes, there are differences among them. For example, in Oracle, a unique index allows multiple rows with NULL, whereas in MS SQL — only one. PostgreSQL can create partial indices via expressions.
Is it possible to circumvent the problem only at the DDL level and without triggers?
In some DBMS — yes, via expressions like COALESCE. But not always, and if complex behavior is needed (where NULL should be considered a duplicate), logical checks or triggers will have to be used.
In a non-critical database, a decision was made to rely on a unique index on (email, department code), where both fields allow NULL. As a result, duplicates appeared repeatedly, breaking external integrations.
Pros:
Cons:
In PostgreSQL, a business key with fields (passport number, series) was realized through an index on COALESCE, and additionally, a check was added on the application side.
Pros:
Cons: