ProgrammingBackend Developer, Data Engineer, DBA

Tell us how to organize the handling and transmission of large volumes of data between SQL servers. What mechanisms exist for batch data transfer, and how can one avoid data loss/corruption? Provide an example of a software solution.

Pass interviews with Hintsage AI assistant

Answer

To transfer large volumes of data between SQL servers, several strategies are used, including bulk insert, ETL processes (Extract-Transform-Load), replication, as well as backup/recovery mechanisms, db-links, and built-in export-import tools.

It is optimal to use batch data transfer mechanisms. An example in SQL Server is BULK INSERT for loading large files or SSIS/Integration Services for complex ETL scenarios. More portable versions use scripts with LIMIT/OFFSET logic and position tracking for transfer. For reliability, the following practices are often applied:

  • Data transfer in chunks with checksums,
  • Temporary staging tables with subsequent integrity checks (check constraints, hashes),
  • Logging of transfer stages.

Example

-- Transferring data in chunks between databases (PostgreSQL) INSERT INTO target_db.public.data_table (col1, col2) SELECT col1, col2 FROM source_db.public.data_table WHERE transferred = FALSE LIMIT 10000;

Trick Question

What is the difference between replication and export-import, and why is replication not always suitable for migrating large historical archives?

Answer: Replication supports the synchronization of current changes and works efficiently for "live" data. For migrating historical archives, replication can be insufficiently fast and flexible, as it does not support custom transformation and does not solve the problem of one-time bulk transfer — ETL is often used here instead.

Examples of real mistakes


Story

The company integrated several regional databases into a single repository. During mass import without batches, the system "froze" due to a lack of memory, leading to partial commits and semi-manual state recovery. This was fixed by switching to batch unloading with progress logging via staging tables.


Story

Due to incorrect checksum control when transferring large files using BULK INSERT, some information was corrupted, but this was discovered several weeks later. The solution was to recalculate checksums before and after each batch with automatic retries for mismatches.


Story

In an attempt to migrate over 100 million rows via standard export-import, one developer did not account for lock management: table locks on the target server caused business operations to be halted for several hours. The conclusion — for such tasks, only use night windows and phased copying with temporary re-indexing.