Answer to the question

The core challenge lies in translating ambiguous business needs into measurable technical constraints when direct instrumentation is unavailable. You must employ a proxy-based elicitation strategy, using synthetic load testing against a shadow environment to empirically derive latency baselines that stakeholders can validate through concrete examples rather than abstract thresholds. Concurrently, architect a defensive buffering pattern using an intermediate message broker or in-memory cache to decouple the legacy system's throughput from the SaaS platform's variable latency, ensuring the five-second hard constraint is satisfied even during vendor-side degradation.

Situation from life

Problem description

I was engaged by a multinational investment bank to facilitate the integration of their legacy IBM z/OS mainframe—hosting core transaction ledgers written in COBOL—with a new Salesforce Service Cloud implementation for client portfolio management. The critical requirement was that any trade execution record updated in the mainframe must reflect in the advisors' Salesforce dashboards within five seconds during market open peaks (approximately 50,000 transactions per minute), yet no business stakeholder could define "acceptable" latency beyond "it needs to feel instant." Complicating matters, Salesforce explicitly declined to provide throughput SLAs for their Bulk API, citing shared-tenant architecture, and the mainframe team prohibited any code modifications due to regulatory freeze periods.

Solution A: Direct synchronous REST API invocation with client-side retry

This approach involved modifying the middleware to call Salesforce REST endpoints immediately upon mainframe commit, employing exponential backoff for failures. Pros: Implementation simplicity and immediate consistency without additional infrastructure. Cons: Under peak load, Salesforce's rate limiting (100 requests per minute per user) triggered cascading timeouts, frequently breaching the five-second window; furthermore, retry storms risked mainframe CICS region exhaustion due to thread blocking.

Solution B: Apache Kafka event streaming with asynchronous processing

We considered deploying a Kafka cluster to ingest mainframe SMF (System Management Facility) logs via a custom parser, allowing Salesforce to consume at its own pace. Pros: Decoupled architectures eliminate backpressure and provide durability. Cons: Log parsing introduced a 3-7 second variable latency due to EBCDIC to ASCII conversion and network hops, making the five-second guarantee statistically impossible during batch synchronization windows; additionally, mainframe security teams rejected the idea of opening TCP/IP ports for Kafka connectors.

Solution C: Change Data Capture (CDC) via IBM InfoSphere with Redis hot-cache and circuit breaker

The chosen architecture utilized IBM InfoSphere Data Replication to capture DB2 DASD write-ahead logs at the storage layer—avoiding COBOL changes—streaming changes to a Redis Cluster (sub-millisecond latency) co-located with the Salesforce middleware. The middleware read from Redis first, using a Hystrix-style circuit breaker to serve stale-but-recent data if Salesforce API latency exceeded 4.5 seconds. Pros: Bypassed mainframe code freeze by operating at the database layer; Redis guaranteed <50ms retrieval; circuit breaker enforced the hard five-second ceiling. Cons: Added operational complexity requiring Redis persistence tuning and potential eventual consistency scenarios during cache invalidation.

Which solution was chosen (and why)

We implemented Solution C because it was the only option that satisfied the immovable five-second constraint without violating the mainframe regulatory freeze or Salesforce architectural limitations. The CDC approach treated the legacy system as an immutable black box, which satisfied compliance officers, while the Redis cache acted as a shock absorber for SaaS volatility. The circuit breaker pattern provided graceful degradation rather than hard failures, aligning with the business's risk tolerance for temporary data staleness versus complete unavailability.

Result

During the first production stress test simulating Black Friday trading volume, the system maintained a P99 latency of 1.8 seconds for advisor dashboard updates, with zero transactions breaching the five-second threshold even when Salesforce experienced a 45-second latency spike due to a competitor's tenant-triggered noisy neighbor effect. The mainframe DB2 CPU overhead increased by only 0.3%, well within capacity plans, and the bank successfully decommissioned the legacy green-screen interface six months ahead of schedule, securing an additional $2M in annual licensing discounts through demonstrated technical feasibility.

What candidates often miss

When business stakeholders describe performance requirements using subjective terms like "instant" or "real-time," what specific techniques can you use to convert these into measurable KPIs without alienating non-technical users?

Do not rely on technical jargon or demand exact milliseconds. Instead, conduct a walkthrough observation session where you shadow users during peak business hours, measuring the actual time they spend waiting for current systems to respond before showing frustration (typically 3-7 seconds for financial advisors). Present these empirical observations as "Did you know that currently you wait an average of 12 seconds, and we can guarantee under 2 seconds?" This reframes the conversation around tangible improvement rather than abstract engineering constraints. Additionally, propose RUM (Real User Monitoring) pilot dashboards using JavaScript agent injection into the SaaS front-end to gather baseline metrics before migration, providing objective data to anchor discussions.

If the legacy mainframe lacks native CDC capabilities and the storage logs (DASD) are encrypted at the hardware level, preventing log-based replication, how can you achieve near-real-time synchronization without modifying legacy source code?

In this scenario, you must leverage database triggers at the DB2 layer rather than application-layer COBOL changes. DB2 for z/OS supports SQL triggers that can invoke external stored procedures via LE (Language Environment) calls to C or Java programs running in USS (Unix System Services). These external routines can then enqueue messages to IBM MQ or Kafka Connect running on z/OS. While this technically touches the database, it avoids changing the procedural COBOL business logic, which is often the regulatory constraint. Alternatively, implement shadow table replication using IBM Db2 Mirror or Q Replication if the z/OS version permits, which operates at the database engine level and is transparent to existing applications.

When a SaaS vendor enforces hard rate limits (e.g., 100 requests/minute) that mathematically cannot support your peak load (1000/minute), and they refuse to negotiate or provide dedicated tenancy, what architectural patterns allow you to respect their terms of service while still meeting your sub-five-second business requirement?

You cannot outperform the API limit, so you must change the data granularity. Implement the Command Query Responsibility Segregation (CQRS) pattern combined with batch delta compression. Instead of sending individual transactions, accumulate changes in your Redis cache layer and broadcast aggregate state snapshots (e.g., "portfolio net value changed by $X") every 30 seconds via a scheduled batch job that consumes only one API call. For the advisors' "instant" view, serve the granular data directly from your Redis cache (the query side), while the SaaS receives the compressed command summary for official record-keeping. This respects the limit because 100 batches per minute covers 6000 updates (100 x 60 seconds / 1 second intervals), well above your 1000/minute need, while keeping user-facing latency at cache speed.