Processing large volumes of data in SQL requires a special approach to prevent memory overflow, locking issues, and to ensure stable performance. One of the main techniques is to break operations into batches: input data is processed in small chunks, reducing server load and allowing better control of transactions and rollbacks in case of errors.
Key aspects:
ROWCOUNT or LIMIT / TOP)COMMIT to relieve the transaction logExample (SQL Server):
DECLARE @BatchSize INT = 1000; WHILE 1 = 1 BEGIN BEGIN TRANSACTION; DELETE TOP(@BatchSize) FROM BigLogTable WHERE CreatedDate < '2021-01-01'; IF @@ROWCOUNT = 0 BREAK; COMMIT TRANSACTION; END
How to delete 100 million records from a large table with minimal impact on performance?
Incorrect answer: "Make one big DELETE".
Correct answer: Delete in chunks (batches) with controlled batch size, do COMMIT after each block, and if necessary reduce disk load and locking by using pauses (WAITFOR DELAY or similar).
Example (PostgreSQL):
DO $$ BEGIN LOOP DELETE FROM big_table WHERE created_at < NOW() - interval '1 year' LIMIT 10000; EXIT WHEN NOT FOUND; COMMIT; END LOOP; END$$;
Story
Project: High-load banking service. Mistake: A developer executed a deletion of outdated logs in one large query for 80 million rows. As a result, the transaction log grew to terabytes, exhausting all available disk space, and the service "crashed".
Story
Project: Online store with inventory accounting system. Mistake: During mass insertion, the transaction size was not limited. While importing massive batches, records encountered errors, requiring a rollback of all previous work and repetition, taking hours instead of minutes.
Story
Project: Retailer, reporting DB of order details. Mistake: Batches were used, but COMMITs were forgotten between iterations — the transaction log grew exponentially, the server began to "lag", and an emergency log cleanup with standard tools was required.