Answer to the question

To verify parity between a legacy COBOL batch process and its Java replacement, a manual tester must execute both systems against identical input datasets and perform a field-by-field reconciliation. The methodology involves stratified sampling of records—prioritizing high-value transactions, boundary dates (e.g., Feb 29, year rollovers), and floating-point edge cases—rather than exhaustive comparison. Testers should export outputs to neutral formats (like CSV) and utilize diffing tools while manually inspecting critical financial fields for rounding discrepancies. Special attention must be paid to Julian date conversions and packed decimal (COMP-3) arithmetic behavior versus IEEE 754 floating-point implementations. Finally, checksum validation and hash comparisons of entire output files serve as a smoke test before detailed field analysis begins.

Situation from life

At a multinational bank, I was tasked with validating the migration of a nightly interest accrual batch job from an IBM Mainframe COBOL system to a Spring Boot microservice running on Linux. The legacy system had processed billions in transactions for decades using COMP-3 packed decimal arithmetic and Julian date (YYDDD) formats, while the new Java application utilized BigDecimal and standard Gregorian calendars. The core problem was ensuring bitwise identical output; even a single cent discrepancy across millions of accounts would constitute a critical financial defect, and subtle differences in rounding modes or leap-year calculations could cascade into material variances.

One solution considered was a brute-force file comparison of all output records. This approach offered exhaustive coverage and absolute certainty that every byte matched. However, the pros were outweighed by severe cons: the dataset contained over fifty million records, making manual comparison humanly impossible within the overnight batch window, and the sheer volume of noise from expected metadata differences (like timestamps) would mask actual data defects.

Another option was simple random sampling of a fixed percentage of records, say one percent. While this provided a statistically significant overview and was quick to execute, the cons were unacceptable for financial auditing: random sampling could easily miss high-impact outliers, such as a specific account type with unique rounding rules or transactions occurring on Feb 29, 2024, which historically triggered bugs in Julian day conversion logic.

The chosen solution was a stratified sampling strategy combined with automated diffing scripts for manual validation. We categorized records by risk tiers: Tier 1 included all accounts with balances exceeding one million dollars and all transactions on date boundaries (month-end, year-end, leap days), while Tier 2 covered random samples from different product types. This approach was selected because it balanced the need for absolute certainty on high-risk transactions with the practical constraints of manual testing time.

For Tier 1, we performed 100% field-level manual reconciliation using Beyond Compare and custom Python scripts to highlight deltas, while for Tier 2, we verified aggregate checksums and spot-checked individual fields. The result was the discovery of a critical defect where COBOL truncated intermediate calculation results at five decimal places, while Java's default BigDecimal division retained scale unpredictably, causing a $0.01 variance on high-interest accounts. Once identified, we adjusted the Java rounding mode to HALF_UP with explicit scale, achieving perfect parity.

What candidates often miss

How do you detect encoding corruption when validating fixed-width files migrated from EBCDIC to ASCII?

Many testers visually inspect data in text editors, missing that COBOL mainframes often use EBCDIC code page CP037 while Java systems use UTF-8. Special characters like currency symbols (€, £) or accented letters in customer names may map incorrectly. To verify, you must open files in a hex editor to compare byte-level representations, ensuring that trailing spaces in COBOL (often hex 40) are not confused with null terminators in Java (hex 00), and that packed decimal (COMP-3) fields are unpacked correctly without sign-bit corruption.

Why might two mathematically equivalent calculations yield different results in COBOL versus Java, even when both use "decimal" types?

Candidates often assume BigDecimal guarantees identical behavior to COBOL's packed decimal. However, COBOL performs base-10 arithmetic with fixed precision dictated by the PIC clause (e.g., PIC 9(9)V99), truncating intermediate results at each operation step per business rules. Java's BigDecimal, by default, maintains arbitrary precision unless you explicitly set a MathContext and RoundingMode. The solution is to replicate COBOL's truncation logic by chaining operations with explicit setScale() calls and matching the legacy rounding mode (often HALF_UP or HALF_EVEN) at every intermediate step, not just the final result.

How do you validate temporal accuracy when the legacy system ignores Daylight Saving Time (DST) while the new Java application uses UTC or local time with DST awareness?

This is often missed because testers compare timestamps superficially. If the legacy COBOL job runs on EST (Eastern Standard Time) year-round while the Java service uses America/New_York (which shifts to EDT), transactions occurring between 2:00 AM and 3:00 AM on the second Sunday in March will have a one-hour offset. To solve this, testers must convert both timestamps to a canonical format (e.g., UTC epoch milliseconds) during manual validation, verify that "end-of-day" batch cutoff parameters (often "23:59:59") are interpreted consistently, and ensure that date-boundary logic (e.g., "last business day of month") does not shift due to the missing hour in spring or extra hour in fall.