In relational databases, the task of ensuring uniqueness often falls on UNIQUE constraints. However, in business practice, there are situations where uniqueness is required based on a combination of fields, some of which may be NULL (for example, a unique combination of email+phone, where the phone may be unknown).
Background: In SQL, standard UNIQUE CONSTRAINTs do not guarantee uniqueness when one of the columns is NULL — the specification considers such values to be mutually exclusive.
Problem: For complex keys with NULL, standard constraints lead to the emergence of duplicates. This is especially critical when data integrity is a necessity: during imports, migrations, large updates.
Solution: Use a computed column that considers all values (including NULL) and impose uniqueness on that column or use trigger logic.
Example code:
ALTER TABLE my_table ADD computed_uniqueness AS ( ISNULL(email, '') + '#' + ISNULL(phone, '') ); CREATE UNIQUE INDEX idx_my_table_computed_uniqueness ON my_table(computed_uniqueness);
Or (for PostgreSQL, where there are expressions in the index and NULLs are treated as distinct):
CREATE UNIQUE INDEX idx_unique_email_phone ON my_table ((COALESCE(email, '##')),(COALESCE(phone, '##')));
Key features:
Can a standard UNIQUE index be used to enforce uniqueness if some columns allow NULL?
No. According to ANSI SQL, a UNIQUE index allows multiple rows where at least one of the columns in the combination is NULL since NULL is not considered equal to any other value, including NULL itself.
What is the difference between using a unique index on an expression and using a BEFORE INSERT trigger to check for uniqueness?
A unique index is simpler to maintain and faster to operate, but it cannot always implement complex business rules (e.g., exceptions or custom combinations). A trigger is more flexible but slower and more complex to maintain.
Example:
CREATE OR REPLACE FUNCTION check_custom_unique() RETURNS TRIGGER AS $$ BEGIN IF EXISTS ( SELECT 1 FROM my_table WHERE COALESCE(NEW.email, '##') = COALESCE(email, '##') AND COALESCE(NEW.phone, '##') = COALESCE(phone, '##') ) THEN RAISE EXCEPTION 'Duplicate found'; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER trg_custom_unique BEFORE INSERT OR UPDATE ON my_table FOR EACH ROW EXECUTE FUNCTION check_custom_unique();
Is it possible to use DISTINCT in SELECT to solve the uniqueness problem at the application level?
Yes, but only for selection — it does not prevent the entry of duplicates when modifying data and is not a substitute for constraints at the table level.
A company implements uniqueness of email+phone through UNIQUE(email, phone). Duplicates appear in the database for email when phone=NULL.
Pros:
Cons:
A computed column (COALESCE(email, '##') + '#' + COALESCE(phone, '')), with a unique index on it is used.
Pros:
Cons: