SQL (ANSI)ProgrammingSQL Developer

When verifying complete compliance against a mandatory checklist where partial fulfillment is insufficient, how do you implement relational division in ANSI SQL to identify entities satisfying every requirement without relying on GROUP BY aggregations?

Pass interviews with Hintsage AI assistant

Answer to the question

History of the question

Relational division was formally defined by Edgar F. Codd in 1970 as the inverse of the Cartesian product, designed to express universal quantification (∀) in relational algebra. While ANSI SQL implements existential quantification (∃) naturally through WHERE clauses and joins, it lacks a native division operator, forcing developers to simulate this set-theoretic operation using logical negation or counting strategies. This pattern appears constantly in regulatory compliance, authorization matrices, and competency tracking systems where identifying "complete sets" is mission-critical.

The problem

Given a dividend table EmployeeTraining(employee_id, module_id) and a divisor table RequiredModules(module_id), the objective is to return every employee_id associated with all rows in the divisor. The challenge transcends simple joins, which find any match; division requires verifying total coverage. Critically, the solution must handle duplicate completion records, empty requirement sets (vacuous truth), and execute efficiently without procedural logic.

The solution

The canonical ANSI SQL approach employs double negation: select employees for whom there does not exist a required module that they have not completed. This translates to nested NOT EXISTS clauses. Alternatively, a counting method compares distinct completions against the required total, though it requires careful handling of duplicates.

-- Double Negation: Pure Relational Division SELECT DISTINCT e.employee_id FROM EmployeeTraining e WHERE NOT EXISTS ( SELECT 1 FROM RequiredModules r WHERE NOT EXISTS ( SELECT 1 FROM EmployeeTraining e2 WHERE e2.employee_id = e.employee_id AND e2.module_id = r.module_id ) ); -- Counting Method (with duplicate handling) SELECT employee_id FROM ( SELECT e.employee_id, COUNT(DISTINCT e.module_id) AS completed_count FROM EmployeeTraining e JOIN RequiredModules r ON e.module_id = r.module_id GROUP BY e.employee_id ) sub WHERE completed_count = (SELECT COUNT(*) FROM RequiredModules);

Situation from life

Situation from life

An aviation maintenance firm needed to certify mechanics for engine repair. The FAA mandated completion of five specific safety modules tracked in Mechanic_Completions, but mechanics often retake failed modules, creating duplicate rows. Running this check daily for 1,200 mechanics against 200 possible modules required a query that ignored duplicates and handled audit scenarios where the requirement list might temporarily empty.

Solution 1: GROUP BY with COUNT(DISTINCT) This approach joined the tables, grouped by mechanic, and compared distinct counts. The primary advantage was readability; junior developers understood the logic immediately. However, it suffered significant performance degradation due to the DISTINCT operation over 2 million historical records. More critically, without explicit COALESCE handling, it returned zero mechanics when the RequiredModules table was empty (audit mode), violating the mathematical principle that universal quantification over an empty set is vacuously true for all elements.

Solution 2: Double Negation with NOT EXISTS This method used two nested NOT EXISTS clauses to check for missing modules. It naturally handled duplicate completion records because it checked only for existence (semi-join behavior) rather than counting occurrences. It correctly returned all mechanics when the requirement set was empty. The drawback involved more complex execution plans; optimizers sometimes chose nested loop joins over hash joins, though proper indexing on module_id mitigated this.

Chosen Solution and Result The team selected the double negation approach because data integrity rules allowed duplicate completion entries, making the counting method risky without expensive DISTINCT operations. The query identified 847 fully certified mechanics out of 1,200 in under 150ms. During a subsequent regulatory audit where all requirements were temporarily suspended, the query correctly identified all 1,200 mechanics as compliant (vacuous truth), preventing unnecessary grounding of the workforce while maintaining logical correctness.

What candidates often miss

What candidates often miss

How does the query behave when the RequiredModules table contains zero rows, and why does this matter mathematically?

When the divisor is empty, relational division must return the entire dividend set (all employees) because vacuous truth dictates that every element satisfies "for all items in empty set." The double negation method achieves this naturally; since no required modules exist, the inner NOT EXISTS never finds a missing module, so the outer clause excludes no one. Conversely, the counting method completed_count = (SELECT COUNT(*) FROM RequiredModules) equates counts to zero, returning only mechanics with zero completions. Candidates must implement a COALESCE wrapper or CASE logic to return all rows when the divisor is empty, or use the double negation pattern which handles this edge case implicitly.

Why does the counting method with COUNT(*) instead of COUNT(DISTINCT module_id) produce false positives, and how do duplicates affect the double negation approach?

If a mechanic completes Module A twice (initial failure, then retake), COUNT(*) returns 2. With only Modules A and B required, a mechanic missing B but with two A records shows a count of 2, falsely satisfying the equality check. This creates critical compliance gaps. Candidates frequently omit DISTINCT, assuming foreign key constraints prevent duplicates. The double negation method checks only for existence (SELECT 1), making it immune to duplicate rows in the dividend table; if any association exists, the module is satisfied. Understanding this distinction is crucial for data environments without perfect uniqueness constraints.

What is the difference between exact relational division and division with remainder, and how would you modify the query to find employees who completed exactly the required modules with no extras?

The solutions above implement "division with remainder" (loose division), returning employees with at least the required modules (supersets). Exact division requires the employee possess no additional modules beyond those required. To achieve this, candidates must add a filtering condition ensuring the mechanic's total distinct module count equals the required count: HAVING COUNT(DISTINCT module_id) = (SELECT COUNT(*) FROM RequiredModules). Many candidates incorrectly assume relational division implies "exactly these and only these," leading to authorization bugs where employees with expired or inappropriate extra certifications are incorrectly approved for sensitive tasks.