Data archiving is one of the most important tasks for high-load OLTP systems with large tables. Historically, the first attempts to implement this were very straightforward: data was either deleted or manually copied to separate tables through scripts or applications. Later, more systematic approaches emerged that considered transactional integrity and minimal impact on the main database operations.
The problem lies not only in the physical transfer of information but also in maintaining consistency, minimizing locks, and ensuring high performance. Errors during archiving can lead to data loss, user locks, or a significant increase in load.
The solution is to use batch operations with transactional control and to form special archive tables with identical structures, or to automate through a task scheduler and procedures.
Example code:
-- Moving 5000 records older than a year to the archive table INSERT INTO archive_orders SELECT * FROM orders WHERE order_date < DATEADD(year, -1, GETDATE()) AND id IN (SELECT TOP 5000 id FROM orders WHERE order_date < DATEADD(year, -1, GETDATE()) ORDER BY id); DELETE FROM orders WHERE id IN (SELECT TOP 5000 id FROM orders WHERE order_date < DATEADD(year, -1, GETDATE()) ORDER BY id);
Key features:
What risk is there with mass DELETE of old records and how to avoid it?
Mass DELETE can lead to lock escalation and slow down the entire database. This can be avoided by performing deletions in small portions inside a loop or using LIMIT/TOP if supported by the DBMS.
WHILE 1=1 BEGIN DELETE TOP (1000) FROM orders WHERE order_date < '2023-01-01'; IF @@ROWCOUNT = 0 BREAK; END
Can TRUNCATE be used to delete archived data?
TRUNCATE removes all rows in a table and is not suitable for conditionally clearing specific rows. It does not trigger, does not support WHERE, and is used only for complete clearing, not selective archiving.
How to ensure that the transfer was correct if deletion and insertion are done in different transactions?
It is better to perform the row transfer in one transaction: first copy to the archive, then delete from the main one. Otherwise, there can be inconsistencies in case of a failure between operations.
BEGIN TRANSACTION INSERT INTO archive_orders SELECT * FROM orders WHERE ... DELETE FROM orders WHERE ... COMMIT
An engineer runs a script on a million records DELETE FROM logs WHERE event_date < '2022-01-01' during working hours.
Pros:
Cons:
Scheduled transfer of 5000 rows via a stored procedure and scheduled job at night, with logging of the success of each portion.
Pros:
Cons: