ProgrammingSQL Developer

Explain the difference between JOIN and subqueries. When should each approach be used? Provide an example of each.

Pass interviews with Hintsage AI assistant

Answer

JOIN allows combining multiple tables in a single SELECT query based on a specific condition, usually through the ON keyword. A subquery (or nested query) is a query placed inside another query. JOIN works better for combining large tables with good indexes, it is more transparent for the optimizer and typically executes faster. Subqueries are convenient when aggregating values or when there are conditions that cannot be expressed with JOIN (e.g., correlated subqueries).

JOIN Example:

SELECT employees.name, departments.name FROM employees JOIN departments ON employees.dept_id = departments.id;

Subquery Example:

SELECT name FROM employees WHERE dept_id IN ( SELECT id FROM departments WHERE location = 'Moscow' );

Trick Question

“Is it true that using subqueries is always slower than using JOIN?”

Answer: No, not always! Well-written subqueries are often transformed by the optimizer into inner JOINs. Moreover, sometimes a subquery is more efficient — for instance, when the outer query works with a small dataset while the subquery returns pre-aggregated results.

Example:

SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);

This subquery will be executed once.


Story

In a large project, nested subqueries were used for reporting instead of JOINs. As the volume of data increased, the queries took several minutes instead of seconds. After replacing subqueries with JOINs, performance improved 10 times.


Story

In one project, an analyst rewrote a query with GROUP BY into a subquery, not noticing that this resulted in duplicates at the JOIN level. This led to incorrect data in reports — sums were inflated due to a Cartesian product.


Story

Using a correlated subquery in the outer loop (for each row) led to complete performance degradation as the table grew. They decided to use JOIN with aggregation — query processing time was reduced from hours to minutes.