Mass deletion of tens of millions of rows is one of the typical and most dangerous operations, especially in high-load databases. Historically, many simply wrote DELETE FROM, which led to table locks and transaction log overflow. The main issue: the transaction becomes too large, servicing processes slow down, and rollback consequences can be unpredictable.
Solution — implement "batch" deletion, processing a small number of rows in a loop with short transactions to minimize locks and impact on the system:
Example code (SQL Server):
WHILE 1=1 BEGIN DELETE TOP (10000) FROM YourHugeTable WHERE CreatedAt < DATEADD(year,-2,GETDATE()); IF @@ROWCOUNT = 0 BREAK; WAITFOR DELAY '00:00:01'; -- small pause to reduce load END
Key features:
Is using TRUNCATE instead of DELETE always faster and safer?
No. TRUNCATE is much faster, but:
In mass DELETE, is it important to use indexes on the filter column?
Yes, having an appropriate index on the filter column (e.g., CreatedAt) speeds up the search for rows to delete and reduces the load on the table. Without an index, the query will touch the whole table, even if a small number of rows is deleted in each batch.
CREATE INDEX idx_createdat ON YourHugeTable(CreatedAt);
What happens if multiple threads perform mass DELETE simultaneously?
This will lead to contention for locks: lock escalations, increased wait times, and a higher chance of deadlock. Mass deletion from the same table should be handled by a single process or very carefully designed range splitting.
The DBA decided to clear a table with 60 million rows with a single query DELETE FROM Log WHERE dt < '2021-01-01'. The server almost "froze", other processes started waiting for execution, the log file grew sharply, and recovery became lengthy.
Pros:
Cons:
Deletion was done in batches of 10,000 rows, the process is controlled, and there is a pause after each batch. The server operates stably, other tasks are executed, the admin monitors progress.
Pros:
Cons: