ProgrammingData Engineer

How to implement atomic bulk insert with integrity guarantees in SQL programming (Bulk Insert with transactional control)?

Pass interviews with Hintsage AI assistant

Answer.

Background

The emergence of large storage and data streams (ETL, migrations) required not only to load hundreds of thousands of rows but also to ensure that data is either fully loaded or not loaded at all. In SQL, this is implemented through atomic bulk operations using transactions.

Problem

During bulk inserts, the risk of error is higher — one incorrect row can spoil the entire load or lead to a partial insert. This is unacceptable for financial, logistics, and other critical systems.

Solution

The practice is to wrap the bulk operation in a transaction, use appropriate special commands (BULK INSERT, COPY), and catch/log errors. Importantly: if any row fails, the entire block is rolled back:

Example for SQL Server:

BEGIN TRAN; BULK INSERT Customers FROM 'C:\data\customers.csv' WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = ' ', FIRSTROW = 2 ); IF @@ERROR <> 0 ROLLBACK TRAN; ELSE COMMIT TRAN;

In PostgreSQL (example with COPY):

BEGIN; COPY products FROM '/tmp/products.csv' DELIMITER ',' CSV HEADER; COMMIT;

Key features:

  • Guarantee of "all or nothing" (atomicity)
  • High loading speed due to batch processing
  • Ability to handle errors with problematic rows logging

Trick questions.

Does the size of the transaction during Bulk Insert affect performance and locks?

Yes, with too large volumes, you can experience long locks, overflow transaction logs, and slow down the server. Best practice is to load in batches (for example, 10000 rows per transaction).

Is Bulk Insert always transactional by default in all DBMS?

No, in some DBMS (for example, MySQL), the bulk insert command is not always automatically atomic — it needs to be wrapped in BEGIN/COMMIT manually, otherwise partial loading can occur.

Can the integrity of foreign keys be guaranteed during bulk insert?

Yes, only if the loading order is respected: parent tables first, then child tables, or temporarily disabling constraints. A foreign key error will roll back the entire bulk insert transaction.

Common mistakes and anti-patterns

  • Attempting to load too large a file in one operation, leading to memory and log file overflow
  • Neglecting error logging — it is difficult to determine why data is inconsistent
  • Violating the loading order of related tables with foreign keys

Real-life example

Negative case

During customer loading, a file with an error in one row led to a partial load — by the end of the day, the database and external source became desynchronized.

Pros:

  • Savings on code structure, simple implementation Cons:
  • Data loss leading to failures in business logic

Positive case

The file is pre-checked for errors, Bulk insert is divided into batches of 5000 rows, each batch in its own transaction. Error logs are saved for further analysis.

Pros:

  • Easy to find and fix problematic rows
  • High performance and correctness of loading Cons:
  • More complex to implement loading partitioning logic
  • Need to support error logging scripts