Answer to the question

History of the Question

Traditional manual testing approaches evolved from validating monolithic SQL transactions where a single database enforced consistency. With the shift to Microservices and Event-Driven Architecture, quality assurance now faces the challenge of verifying distributed Saga patterns where state changes propagate asynchronously across service boundaries, requiring new methodologies to ensure data integrity without two-phase commit locks.

The Problem

The core challenge lies in detecting race conditions and partial failure states when ACID guarantees are isolated to individual service databases. Specifically, verifying that inventory reservations in PostgreSQL, payment authorizations via external APIs, and order confirmations through Apache Kafka topics maintain consistency during network partitions, Kafka consumer rebalancing, or Redis cache invalidation failures demands understanding CAP theorem trade-offs and eventual consistency windows.

The Solution

A comprehensive Chaos Engineering-inspired manual testing methodology that combines precise timing manipulation with state-transition mapping. This involves manually injecting latency into Kafka consumer groups using Proxy tools, simulating Redis cache evictions during active transactions, and verifying that Saga compensating transactions correctly rollback operations when downstream failures occur, ensuring the system maintains consistency without allowing phantom inventory or duplicate charges.

Situation from life

A luxury watch marketplace was preparing for a limited-edition release of 100 exclusive timepieces with anticipated concurrent demand from over 10,000 users. The architecture utilized Spring Boot microservices where the Inventory Service managed stock in PostgreSQL, the Payment Service integrated with Stripe API, and Apache Kafka facilitated asynchronous communication between them. During pre-production simulation, the team discovered a critical flaw where two users simultaneously purchased the final available unit because the inventory verification and reservation occurred in separate asynchronous messages, creating a split-brain scenario where both payments were captured before either order service confirmed the stock deduction.

Solution 1: Horizontal scaling of Kafka consumers

This approach involved increasing consumer instances to reduce message processing lag and minimize the window for race conditions. The primary advantage was improved throughput and reduced latency under normal load. However, this did not fundamentally resolve the race condition; it merely made the collision statistically less likely while remaining possible during peak traffic or consumer rebalancing events.

Solution 2: Implementing distributed locks via Redis Redlock

This strategy introduced atomic locking mechanisms where the Inventory Service would acquire a distributed lock before processing any checkout request. While this prevented concurrent modifications to the same stock item, it introduced significant latency to the checkout flow, created a potential single point of failure if the Redis cluster experienced network partitions, and complicated failure recovery scenarios where locks might not be released due to application crashes.

Solution 3: Manual orchestrated failure injection with Kafka partition control

This methodology required testers to manually pause specific Kafka partitions using administrative tools like Kafdrop while injecting network latency via Docker network policies. This allowed precise reproduction of the exact timing window between payment authorization and inventory commitment. The approach was time-intensive and required elevated privileges to manipulate Kubernetes network policies, but it provided deterministic reproduction of race conditions and direct observation of Saga compensating transaction triggers.

Chosen solution and rationale

Solution 3 was selected because only deterministic manual intervention could expose the microsecond timing vulnerability between services. By deliberately pausing the inventory consumer while allowing the payment consumer to process, we confirmed that the system lacked a pre-payment reservation lock and that compensation workflows failed to trigger automatically when inventory conflicts were detected.

Result

The development team implemented a two-phase commit pattern with a Pending inventory status that reserved stock before payment processing. Manual testing then verified that forcing a Kafka rebalancing during active checkout correctly triggered the Saga compensation, releasing both inventory reservations and payment holds without data loss. The subsequent product launch proceeded successfully with zero duplicate sales reported and all 100 units accounted for in the final ledger.

What candidates often miss

How do you verify ACID properties when Microservices implement Eventual Consistency rather than distributed transactions?

Candidates frequently conflate local database ACID compliance with global system consistency. In manual testing, you must deliberately engineer scenarios where a PostgreSQL transaction commits successfully but the subsequent Apache Kafka message publishing fails, which can be achieved using Docker network partitions to isolate the message broker. Verify that the service implements the Outbox Pattern or transactional messaging to ensure database commits and event publishing remain atomic. Check for orphaned records by querying the database directly while blocking the message broker, then confirming that retry mechanisms eventually synchronize the state without manual intervention or data corruption.

What distinguishes testing Idempotency from testing Exactly-Once semantics in Message Queues, and why is this critical for manual QA?

Many testers incorrectly treat these as interchangeable concepts. Idempotency ensures that processing the same message multiple times produces an identical result to processing it once, which you test by manually replaying a Kafka message from Offset Explorer and verifying no duplicate charge or inventory deduction occurs. Exactly-Once semantics ensure the infrastructure itself prevents duplicate delivery, which you validate by observing Kafka transactional producer behavior during broker failover scenarios. Manual QA must verify both dimensions: that the application handles duplicates gracefully via idempotent logic, and that UUID-based deduplication filters function correctly when the broker legitimately redelivers messages due to acknowledgment timeouts.

How do you validate Compensating Transactions within a Saga pattern without risking production financial data integrity?

This requires constructing isolated test environments that mirror production Schemas and API contracts but utilize sandbox credentials for payment providers. Manually trigger failure sequences by terminating Docker containers immediately after the payment authorization step but before the inventory service confirmation. Verify that the compensation workflow correctly issues refunds and releases Redis distributed locks. Candidates often overlook verifying that the compensation mechanism itself can fail; you must test by blocking the compensation path, such as simulating a network outage during the rollback phase, and ensuring the system enters a clearly defined Compensation Failed alarm state with appropriate monitoring alerts rather than an undefined inconsistent state that could lead to financial discrepancies.