ProgrammingBackend Developer, Data Engineer

How to correctly implement bulk update of related tables with many conditions in SQL to avoid deadlock, data loss, and maximize performance?

Pass interviews with Hintsage AI assistant

Answer.

Bulk UPDATE is a critical procedure when modifying a large number of rows in related tables. In SQL history, the typical implementation involves UPDATE with subquery or JOIN. The problem is that any bulk update operation without controlling the order of execution blocks multiple rows, causes lock escalation, and can lead to deadlocks during multiple updates.

Solution:

  • Always split the UPDATE into small batches (e.g., by primary key or date ranges).
  • Use SET-oriented approaches via JOIN, but avoid bulk updates without constraints.
  • Properly apply filters, index fields by WHERE conditions, and consider the order of operations for related tables.

Example code (PostgreSQL):

UPDATE Orders o SET status = 'archived' FROM Customers c WHERE o.customer_id = c.id AND c.closed = TRUE AND o.status != 'archived';

Or in batches:

WITH upd AS ( SELECT o.id FROM Orders o JOIN Customers c ON o.customer_id = c.id WHERE c.closed = TRUE AND o.status != 'archived' LIMIT 10000 ) UPDATE Orders SET status = 'archived' WHERE id IN (SELECT id FROM upd);

Key features:

  • Avoid updating "the whole table at once" — always batch.
  • Use indexes on updating and filtering fields.
  • Clearly define selection criteria, avoiding bulk updates of unnecessary rows.

Trick questions.

What will happen if you run UPDATE on similar tables simultaneously without separating ranges or using opposing filters?

Deadlock is likely to occur: processes block the same rows, waiting for each other. To avoid this, batches should not overlap or should be executed strictly sequentially.

Is there a difference between UPDATE through JOIN and subquery when it comes to bulk status change?

If appropriate indexes exist, the key difference lies only in readability and sometimes in the performance of a specific DBMS. JOIN is usually faster as it allows the optimizer to create a better plan.

When is it relevant to use TRUNCATE/DELETE instead of UPDATE?

If business logic allows — for example, when you need to physically remove archived records or reset a table, rather than just changing a status flag. However, for bulk status updates — only use UPDATE.

Common mistakes and anti-patterns

  • Bulk UPDATE "without filter": blocking, rollback, deadlock.
  • Lack of indexes — full table scans.
  • Parallel execution of UPDATE without key range division.

Real-life example

Negative case

In a large online store, multiple UPDATEs to change order and customer statuses were run simultaneously, without dividing by intervals. The result: mutual blocks, forced rollbacks several times, and unrecorded data was lost.

Pros:

  • Everything in one query.

Cons:

  • Possible deadlock, loss of performance, huge data sets rolled back even with minor errors.

Positive case

Large selections were split into batches, executed strictly sequentially, and only necessary rows were processed according to the filter.

Pros:

  • Stable database operation.
  • Performance is not affected.

Cons:

  • Larger volume of code, monitoring of batch execution is required.