ProgrammingBackend Developer

How to efficiently archive and move old data from a high-load table to a separate storage using SQL? What are the approaches, problems, and optimal solutions?

Pass interviews with Hintsage AI assistant

Answer.

Data archiving is one of the most important tasks for high-load OLTP systems with large tables. Historically, the first attempts to implement this were very straightforward: data was either deleted or manually copied to separate tables through scripts or applications. Later, more systematic approaches emerged that considered transactional integrity and minimal impact on the main database operations.

The problem lies not only in the physical transfer of information but also in maintaining consistency, minimizing locks, and ensuring high performance. Errors during archiving can lead to data loss, user locks, or a significant increase in load.

The solution is to use batch operations with transactional control and to form special archive tables with identical structures, or to automate through a task scheduler and procedures.

Example code:

-- Moving 5000 records older than a year to the archive table INSERT INTO archive_orders SELECT * FROM orders WHERE order_date < DATEADD(year, -1, GETDATE()) AND id IN (SELECT TOP 5000 id FROM orders WHERE order_date < DATEADD(year, -1, GETDATE()) ORDER BY id); DELETE FROM orders WHERE id IN (SELECT TOP 5000 id FROM orders WHERE order_date < DATEADD(year, -1, GETDATE()) ORDER BY id);

Key features:

  • Transferring old data in batches to reduce load.
  • Coordinating insert and delete operations using transactions.
  • Scheduling automation through job schedules and procedures.

Tricky Questions.

What risk is there with mass DELETE of old records and how to avoid it?

Mass DELETE can lead to lock escalation and slow down the entire database. This can be avoided by performing deletions in small portions inside a loop or using LIMIT/TOP if supported by the DBMS.

WHILE 1=1 BEGIN DELETE TOP (1000) FROM orders WHERE order_date < '2023-01-01'; IF @@ROWCOUNT = 0 BREAK; END

Can TRUNCATE be used to delete archived data?

TRUNCATE removes all rows in a table and is not suitable for conditionally clearing specific rows. It does not trigger, does not support WHERE, and is used only for complete clearing, not selective archiving.

How to ensure that the transfer was correct if deletion and insertion are done in different transactions?

It is better to perform the row transfer in one transaction: first copy to the archive, then delete from the main one. Otherwise, there can be inconsistencies in case of a failure between operations.

BEGIN TRANSACTION INSERT INTO archive_orders SELECT * FROM orders WHERE ... DELETE FROM orders WHERE ... COMMIT

Common Mistakes and Anti-Patterns

  • Deleting large arrays of data with a single query, causing locks.
  • Archiving without checking that all rows have actually been moved.
  • Using TRUNCATE instead of DELETE - leads to loss of all the table's data.

Real-Life Example

Negative Case

An engineer runs a script on a million records DELETE FROM logs WHERE event_date < '2022-01-01' during working hours.

Pros:

  • The script is simple

Cons:

  • The whole table locks, users cannot work, the process lasts for hours, rollback is impossible without a backup.

Positive Case

Scheduled transfer of 5000 rows via a stored procedure and scheduled job at night, with logging of the success of each portion.

Pros:

  • Minimum locking
  • Action log
  • Control of error count

Cons:

  • Requires preparation of procedures and periodic monitoring.