Background:
The EXISTS and IN operators are used for filtering records based on subqueries. Since the inception of SQL, developers have faced the choice between them, trying to understand which method works faster and in what cases their use is preferable.
Problem:
The main task is to retrieve only those rows that have a corresponding match in the external or internal table, which is always critical for performance with large datasets. The choice between EXISTS and IN depends on the structure of the subquery, the number and uniqueness of the returned values, as well as the database management system being used.
Solution:
IN is usually more efficient when the subquery returns a small number of unique values.EXISTS is preferable if only the existence of matching rows is important; it is suitable for large subqueries returning thousands or millions of rows.NULL and differences in optimization among different database management systems.Code example:
-- Using IN SELECT name FROM students WHERE id IN (SELECT student_id FROM enrollments WHERE course = 'SQL'); -- Using EXISTS SELECT name FROM students WHERE EXISTS (SELECT 1 FROM enrollments WHERE enrollments.student_id = students.id AND enrollments.course = 'SQL');
Key features:
What happens if NULL appears in the IN subquery?
Many believe that IN simply ignores NULL, but in the presence of NULL the result can become unpredictable. For example, the query:
SELECT id FROM orders WHERE client_id IN (1, NULL, 2);
technically will not include rows where client_id is not equal to 1 or 2, but if the subquery list only has NULL, the result will be empty.
Are EXISTS and IN completely interchangeable constructs?
No. Using EXISTS is often faster because it doesn't need to analyze the entire subquery. Furthermore, IN does not work with multi-column subqueries, while EXISTS does, as it compares based on the condition in the WHERE clause. For example:
SELECT col1 FROM t1 WHERE (col1, col2) IN (SELECT col3, col4 FROM t2);
This version is often unsupported, while the corresponding EXISTS is.
Can IN work faster than EXISTS when dealing with indexed fields?
Yes, if the subquery is small and there is an index on the comparison field, IN can be faster. However, with large result sets or the absence of an index — the opposite is true.
An analyst produced a report using IN, not taking into account that the subquery returns hundreds of thousands of rows with NULL. The report began running for minutes, sometimes losing data.
Pros:
The same query was rewritten using EXISTS with an additional condition, and indexes were recalculated.
Pros: