ProgrammingSQL Developer

Explain the differences between the EXISTS and IN operators in SQL. When should each be used for filtering related data, and what nuances may affect performance and the correctness of the results?

Pass interviews with Hintsage AI assistant

Answer.

Background:

The EXISTS and IN operators are used for filtering records based on subqueries. Since the inception of SQL, developers have faced the choice between them, trying to understand which method works faster and in what cases their use is preferable.

Problem:

The main task is to retrieve only those rows that have a corresponding match in the external or internal table, which is always critical for performance with large datasets. The choice between EXISTS and IN depends on the structure of the subquery, the number and uniqueness of the returned values, as well as the database management system being used.

Solution:

  • IN is usually more efficient when the subquery returns a small number of unique values.
  • EXISTS is preferable if only the existence of matching rows is important; it is suitable for large subqueries returning thousands or millions of rows.
  • It is also important to remember the specific behavior with NULL and differences in optimization among different database management systems.

Code example:

-- Using IN SELECT name FROM students WHERE id IN (SELECT student_id FROM enrollments WHERE course = 'SQL'); -- Using EXISTS SELECT name FROM students WHERE EXISTS (SELECT 1 FROM enrollments WHERE enrollments.student_id = students.id AND enrollments.course = 'SQL');

Key features:

  • EXISTS stops searching upon the first match (the nested subquery can stop).
  • IN is usually efficient with short lists, but can be slow with long non-unique subqueries.
  • EXISTS works correctly with the presence of NULL, while IN may behave unexpectedly with NULL in the subquery.

Trick questions.

What happens if NULL appears in the IN subquery?

Many believe that IN simply ignores NULL, but in the presence of NULL the result can become unpredictable. For example, the query:

SELECT id FROM orders WHERE client_id IN (1, NULL, 2);

technically will not include rows where client_id is not equal to 1 or 2, but if the subquery list only has NULL, the result will be empty.

Are EXISTS and IN completely interchangeable constructs?

No. Using EXISTS is often faster because it doesn't need to analyze the entire subquery. Furthermore, IN does not work with multi-column subqueries, while EXISTS does, as it compares based on the condition in the WHERE clause. For example:

SELECT col1 FROM t1 WHERE (col1, col2) IN (SELECT col3, col4 FROM t2);

This version is often unsupported, while the corresponding EXISTS is.

Can IN work faster than EXISTS when dealing with indexed fields?

Yes, if the subquery is small and there is an index on the comparison field, IN can be faster. However, with large result sets or the absence of an index — the opposite is true.

Common mistakes and anti-patterns

  • Using IN with a subquery that returns a large number of rows.
  • Neglecting the consequences of having NULL in the subquery.
  • Replacing EXISTS with IN by default in all cases without analyzing the execution plan.

Real-life example

Negative case

An analyst produced a report using IN, not taking into account that the subquery returns hundreds of thousands of rows with NULL. The report began running for minutes, sometimes losing data.

Pros:

  • Logic is easy to understand Cons:
  • Performance loss
  • Risk of obtaining incorrect data

Positive case

The same query was rewritten using EXISTS with an additional condition, and indexes were recalculated.

Pros:

  • Fast response
  • Correct filtering even with large subqueries and NULL Cons:
  • More complex condition, requires testing of the execution plan