The architecture centers on a deterministic transaction sequencer that orders operations using Hybrid Logical Clocks rather than physical time, eliminating clock skew dependencies inherent in TrueTime-based systems. The coordinator implements a variant of the Calvin protocol, where transaction intents are replicated to a majority of shard leaders before execution, ensuring serializability through deterministic scheduling rather than distributed locking. TLA+ specifications model the state transitions of the coordinator, formally verifying that the system maintains safety (strict serializability) and liveness (all committed transactions eventually complete) even during partial network partitions.
The coordinator persists transaction logs to a WAL (Write-Ahead Log) using Paxos consensus for durability across availability zones. Shard proxies abstract the underlying storage engines—whether PostgreSQL, MongoDB, or Cassandra—presenting a unified interface for the execution engine. Conflict detection employs a dependency graph constructed during the sequencing phase, allowing concurrent execution of non-conflicting transactions while maintaining equivalence to a serial order.
A global investment bank required migrating their trade settlement system from a monolithic Oracle database to a sharded architecture spanning AWS and Azure regions. The critical challenge involved ensuring atomic settlement of trades that touched multiple asset classes stored in different database technologies—equities in PostgreSQL and derivatives in ScyllaDB—without deploying atomic clocks or GPS time sources for synchronization.
One proposed solution utilized standard XA transactions with a two-phase commit (2PC) protocol managed by a Narayana transaction manager. This approach offered strong consistency and mature ecosystem support, but introduced blocking behavior where a coordinator failure during the prepare phase would leave shards holding locks indefinitely, violating the liveness requirements during cross-cloud network instability.
Another alternative considered the Saga pattern implemented via Axon Framework, utilizing compensating transactions for rollback scenarios. While this provided high availability and avoided distributed locking, it sacrificed strict serializability—unacceptable for financial settlement where intermediate states must never be observable, and compensation logic for irreversible external market operations proved prohibitively complex.
The selected architecture implemented a Calvin-inspired deterministic coordinator with TLA+ formal verification. The system sequenced all settlement transactions through a replicated state machine using Raft for the coordination log, then executed them in the same order across all shards using idempotent stored procedures. This eliminated the need for distributed locking during execution and allowed TLA+ model checking to mathematically prove that the system could not deadlock or lose settlements during arbitrary network partitions.
The deployment resulted in a 40% reduction in settlement latency compared to the legacy Oracle system, while maintaining full ACID guarantees across clouds. During a subsequent region-wide AWS outage, the system continued processing trades without manual intervention, validating the formally proven liveness properties.
What is the fundamental difference between strict serializability and linearizability, and why does a distributed transaction coordinator typically target the former rather than the latter?
Strict serializability combines serializability (transactions appear to execute in some sequential order) with linearizability's real-time constraint (transactions complete before subsequent ones begin). While linearizability applies to single-object operations, strict serializability extends this to multi-object transactions. Candidates often conflate these, designing systems that ensure single-key linearizability but fail to prevent anomalies like write skew across multiple keys. A coordinator achieves strict serializability by establishing a global order of transactions—often through a sequencing layer or timestamp oracle—whereas linearizability alone can be satisfied per-shard without cross-shard ordering guarantees.
Why does the two-phase commit (2PC) protocol block indefinitely during coordinator failure, and how does the three-phase commit (3PC) fail to solve this under network partitions?
In 2PC, once a participant votes "yes" during the prepare phase, it holds locks until receiving the global commit/abort decision from the coordinator. If the coordinator fails after receiving all votes but before broadcasting the decision, participants remain uncertain and locked, violating availability. 3PC attempts to solve this by adding a pre-commit phase and timeout-based progress, but under network partitions, a participant cannot distinguish between a failed coordinator and a partitioned one. This leads to split-brain scenarios where different partitions make conflicting decisions, violating consistency. The fundamental issue is that FLP impossibility proves that deterministic consensus is impossible in asynchronous systems with even a single faulty process, meaning any commit protocol must choose between blocking (safety) and potential inconsistency (liveness) during certain failure modes.
How does TLA+ verify liveness properties in a transaction coordinator, and what specific temporal logic operators express "eventually commits" versus "never loses data"?
TLA+ uses temporal logic to specify that good things eventually happen (liveness) while bad things never happen (safety). The liveness property that all initiated transactions eventually complete is expressed using the eventually operator (◇), typically written as Initiated(t) ~> Committed(t) (leads-to), meaning if transaction t initiates, it will eventually commit or abort. Safety properties like "never loses data" use the always operator (□), written as □(Committed(t) ⇒ ◇(Query(t) = Value)), meaning once committed, the value is always eventually readable. Candidates often miss that liveness checking requires fairness assumptions—weak fairness (WF_vars(Action)) ensures that if an action remains enabled, it must eventually occur, preventing infinite stuttering where the coordinator simply stops taking steps. Without these fairness constraints, TLA+ models would trivially satisfy liveness properties by doing nothing.