Bulk data updating across multiple related tables is a classic task in SQL industrial programming. With the development of business applications, the need arose to simultaneously update large volumes of data while ensuring their consistency. Historically, cyclic scenarios were employed, leading to low performance and prolonged locks. Later, advanced DML operators (such as MERGE), transaction constructs, and approaches with staging tables emerged.
The Problem is that updating data affects multiple tables with relationships (for example, orders and order details), which is fraught with "orphan rows", performance loss due to locks, and unpredictable load on the DBMS.
The Solution is based on the use of atomic transactions, UPDATE/DELETE/MERGE operations with JOIN conditions, as well as batch data processing. A good practice is to delay aggregated changes into temporary staging tables and then apply them in batches through a transaction. An example for SQL Server using MERGE:
BEGIN TRANSACTION; -- Example of bulk updating a main table and related table using MERGE MERGE INTO orders AS tgt USING temp_order_updates AS src ON tgt.id = src.id WHEN MATCHED THEN UPDATE SET tgt.status = src.status, tgt.updated_at = src.updated_at; MERGE INTO order_details AS tgt USING temp_detail_updates AS src ON tgt.order_id = src.order_id AND tgt.sku = src.sku WHEN MATCHED THEN UPDATE SET tgt.price = src.price, tgt.qty = src.qty; COMMIT;
Key Features:
Can I just perform an update on the main table and then separately on the related ones without transactions if speed requirements are strict?
Separate UPDATEs outside a transaction cause severe data inconsistency upon any stage failure — for example, if the orders are updated but the details are not, the logic breaks. In modern DBMS, using transactions almost does not increase overhead in batch processing.
Will performance drop if I do one large UPDATE with a subquery? Can this lead to locks?
Yes, monolithic UPDATEs on large tables lead to lock escalation, table locks, and downtime for other users. It is better to break the processing into batches with limits via WHERE ... AND rownum/id/limit.
Example of a batch:
UPDATE orders SET status = 'closed' WHERE status = 'pending' AND id BETWEEN 100000 AND 199999;
Does MERGE guarantee atomicity and correct processing order for related tables?
No, MERGE operates within one table. To update related tables, a separate MERGE or UPDATE is required, and both actions must be placed within a single transaction.
A company updated the status ("Completed") in a million orders with separate queries outside of a transaction: main orders, then order_details. Under load, the server "fell" — in case of a failure, details remained with an "open" status. Pros:
Cons:
Implemented staging tables and grouped processing within a transaction. First, all changes were calculated and stored in temporary tables, then both main tables were updated in batches. In case of a failure — complete rollback. Pros:
Cons: