ProgrammingBackend Developer

How to implement efficient processing of large volumes of data in SQL using batch operations, and what memory and transaction management mechanisms should be considered?

Pass interviews with Hintsage AI assistant

Answer.

Processing large volumes of data in SQL requires a special approach to prevent memory overflow, locking issues, and to ensure stable performance. One of the main techniques is to break operations into batches: input data is processed in small chunks, reducing server load and allowing better control of transactions and rollbacks in case of errors.

Key aspects:

  • Use loops with batch size specification (ROWCOUNT or LIMIT / TOP)
  • Controlled number of affected rows in a single transaction
  • After each batch — COMMIT to relieve the transaction log
  • In case of an error — rollback only the current batch, not the whole operation

Example (SQL Server):

DECLARE @BatchSize INT = 1000; WHILE 1 = 1 BEGIN BEGIN TRANSACTION; DELETE TOP(@BatchSize) FROM BigLogTable WHERE CreatedDate < '2021-01-01'; IF @@ROWCOUNT = 0 BREAK; COMMIT TRANSACTION; END

Misleading question.

How to delete 100 million records from a large table with minimal impact on performance?

Incorrect answer: "Make one big DELETE".

Correct answer: Delete in chunks (batches) with controlled batch size, do COMMIT after each block, and if necessary reduce disk load and locking by using pauses (WAITFOR DELAY or similar).

Example (PostgreSQL):

DO $$ BEGIN LOOP DELETE FROM big_table WHERE created_at < NOW() - interval '1 year' LIMIT 10000; EXIT WHEN NOT FOUND; COMMIT; END LOOP; END$$;

Examples of real mistakes due to ignorance of the nuances of the subject.


Story

Project: High-load banking service. Mistake: A developer executed a deletion of outdated logs in one large query for 80 million rows. As a result, the transaction log grew to terabytes, exhausting all available disk space, and the service "crashed".


Story

Project: Online store with inventory accounting system. Mistake: During mass insertion, the transaction size was not limited. While importing massive batches, records encountered errors, requiring a rollback of all previous work and repetition, taking hours instead of minutes.


Story

Project: Retailer, reporting DB of order details. Mistake: Batches were used, but COMMITs were forgotten between iterations — the transaction log grew exponentially, the server began to "lag", and an emergency log cleanup with standard tools was required.