To import large volumes of data in bulk, specialized commands and utilities are used: BULK INSERT, COPY, LOAD DATA INFILE, external tools like bcp (SQL Server), psql (PostgreSQL), as well as ETL utilities.
Example for PostgreSQL:
COPY staging_table (id, name, age) FROM '/path/to/data.csv' DELIMITER ',' CSV HEADER; -- Data verification, transfer to production table after validation INSERT INTO prod_table (id, name, age) SELECT id, name, age FROM staging_table WHERE age >= 0 AND name IS NOT NULL;
Question: Why might there be a sharp drop in performance of subsequent operations after a bulk insert into a large indexed table?
Answer: Because a bulk insert fills the table directly, while indexes are rebuilt/updated only after the main import, which can lock the table and consume resources. It is recommended to disable secondary indexes during the import and recreate them afterwards, or to split into batches.
Story
In a logistics project, millions of rows were loaded through BULK INSERT without a temporary table — invalid data "clogged" the indexes with invalid information, after which it was not possible to simply "rollback" part of the bad rows due to FK and check constraints. The data had to be cleaned manually.
Story
In a corporate service, bulk insert increased import time by 10 times because the secondary indexes were not disabled during the upload, and their structure was recalculated at each step.
Story
In a fintech product, when loading large files, bulk insert failed to load all rows due to silent errors, as error codes were not handled correctly — part of the important information was lost and only discovered after comparison with external sources several days later.