ProgrammingBackend Developer

How to implement multi-version concurrency control (MVCC) support in modern DBMS when programming in SQL, and why is MVCC critical for high-load applications?

Pass interviews with Hintsage AI assistant

Answer.

Background

The concept of multi-versioning (MVCC, Multi-Version Concurrency Control) arose as an alternative to strict locking to enable the parallel operation of many transactions. This was important to reduce conflicts and locking during concurrent access to data, which is particularly critical in OLTP systems.

Problem

Traditional locking approaches (e.g., row-level locking) can lead to application slowdowns under high contention. The purpose of MVCC is to allow transactions to read consistent snapshots of data even when concurrent write operations are taking place, thus ensuring isolation and concurrent access.

Solution

MVCC is implemented in popular DBMSs (PostgreSQL, Oracle, MySQL/InnoDB) by storing historical versions of rows. When reading, each transaction sees only those rows that were committed before its start, while inserts/updates create new versions of rows without immediately deleting them.

Example query (PostgreSQL):

BEGIN TRANSACTION; SELECT * FROM orders WHERE status = 'processing'; UPDATE orders SET status = 'completed' WHERE id = 42; COMMIT;

While the transaction is not completed, other users will see the previous version of the row, and only after the commit will the changes be available to new transactions.

Key features:

  • MVCC prevents locking on reads (readers don't block writers, writers don't block readers).
  • Easy to implement data "snapshots" for analytics.
  • Old row versions require periodic garbage collection (VACUUM/garbage collection).

Tricky Questions.

Can MVCC completely eliminate all types of locks and conflicts?

No, conflicts can still occur in MVCC when updating the same rows concurrently — for example, concurrent UPDATEs can lead to a commit conflict (write-write conflict), and the DBMS will throw an error or roll back one of the transactions.

When are old row versions deleted in MVCC, and can this lead to memory leaks?

In most DBMSs, old row versions are deleted by special processes (VACUUM in PostgreSQL). If these processes are not run, the database "bloats" and performance degrades.

Do "select for update" work correctly under MVCC, and why is locking necessary?

Yes, SELECT FOR UPDATE queries lock rows to prevent conflicts during parallel modifications, otherwise "lost updates" could occur.

Example:

BEGIN; SELECT * FROM products WHERE id = 123 FOR UPDATE; UPDATE products SET quantity = quantity - 1 WHERE id = 123; COMMIT;

Typical Mistakes and Anti-Patterns

  • Not considering the need to clean up "dead" rows, leading to a growing database and performance degradation.
  • Ignoring write/write conflicts — relying solely on MVCC without checking for commit errors.
  • Mixing different transaction isolation levels without understanding their impact on consistency.

Real-life Example

Negative Case

In a large online store, a scheme with frequent ORDER UPDATEs was implemented without configuring VACUUM. After a month, the database grew 10 times, and queries slowed significantly.

Pros:

  • High concurrency at the start, quick implementation.

Cons:

  • Occupying disk space, system failure under a large volume.

Positive Case

A regular autovacuum was implemented, write-conflict control was used, and REPEATABLE READ isolation was only applied for critical queries.

Pros:

  • High performance is maintained.
  • Data integrity is guaranteed.

Cons:

  • Complexity in configuring VACUUM parameters.
  • Need for monitoring cleanup processes.