Answer to the question.
History of the question.
Prior to the SQL:2016 standard, identifying multi-row sequential patterns within ordered datasets required convoluted self-joins, cursor-based procedural logic, or recursive CTEs simulating finite state machines. These approaches suffered from combinatorial explosion, poor performance, and maintenance nightmares. The introduction of the MATCH_RECOGNIZE clause provided a declarative, mathematically rigorous syntax based on regular expressions for row pattern recognition, enabling complex event processing directly within the relational engine.
The problem.
Detecting specific variable-length sequences—such as W-shaped price formations—requires comparing each row against multiple predecessors and successors while maintaining contextual state across the entire sequence. Standard window functions can only reference fixed offsets (e.g., LAG 1, LEAD 1), making them incapable of handling patterns where leg durations vary. Recursive CTEs can theoretically track state transitions but become computationally expensive and syntactically verbose when handling multi-step patterns with strict ordering constraints.
The solution.
MATCH_RECOGNIZE enables the definition of pattern variables using boolean conditions, specification of the target pattern via regular expression syntax (e.g., A B+ C+ D+ E+), and computation of aggregate measures across the matched rows. It handles partitioning, ordering, and navigation functions (PREV, NEXT, FIRST, LAST) natively.
SELECT * FROM stock_ticks MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tick_time MEASURES STRT.price AS start_price, FINAL LAST(DOWN1.price) AS first_trough, FINAL LAST(UP1.price) AS middle_peak, FINAL LAST(DOWN2.price) AS second_trough, FINAL LAST(UP2.price) AS end_price, MATCH_NUMBER() AS pattern_id ONE ROW PER MATCH AFTER MATCH SKIP TO LAST UP2 PATTERN (STRT DOWN1+ UP1+ DOWN2+ UP2+) DEFINE DOWN1 AS DOWN1.price < PREV(DOWN1.price), UP1 AS UP1.price > PREV(UP1.price), DOWN2 AS DOWN2.price < PREV(DOWN2.price) AND DOWN2.price < FIRST(UP1.price), -- Must dip below middle peak UP2 AS UP2.price > PREV(UP2.price) ) AS pattern_matches;
Situation from life
Context.
A quantitative trading firm needed to detect W-shaped double-bottom patterns in high-frequency forex data (tick-by-tick) to automate entry points for long positions. The pattern required two distinct troughs separated by a peak, with each leg representing at least a 0.5% price movement.
The problem.
The dataset contained 10 million rows daily across 50 currency pairs. Python-based detection introduced network latency and memory constraints when transferring gigabytes of data hourly. Standard SQL approaches using multiple LAG()/LEAD() self-joins created Cartesian products when attempting to correlate the four legs of the W-pattern, causing queries to timeout after 10 minutes.
Solution 1: Client-side Python processing.
The team initially used pandas with custom loop logic to detect peaks and troughs. Pros: Rich analytical libraries, easy unit testing. Cons: Massive data transfer bottleneck (hours of latency), memory exhaustion on the application server when processing full market history, and inability to react in real-time.
Solution 2: Recursive CTE state machine.
They attempted a recursive CTE tracking five states (0=seeking start, 1=first decline, 2=first rise, 3=second decline, 4=second rise). Pros: Pure SQL, logically rigorous. Cons: Single-threaded execution in the database engine, exponential slowdown with deep recursion, and 300+ lines of incomprehensible SQL prone to stack overflow errors on volatile sequences.
Solution 3: MATCH_RECOGNIZE implementation.
The team implemented the SQL:2016 pattern matching query shown above. Pros: Native engine optimization (vectorized execution), concise 25-line query that mirrored the mathematical pattern definition exactly, automatic handling of variable-length legs via quantifiers (+), and efficient skipping to prevent redundant overlapping matches. Cons: Required database migration to Oracle 19c (which supports SQL:2016 features) and initial training for developers unfamiliar with regular expression syntax in SQL.
Chosen solution and result.
Solution 3 was selected due to its sub-second performance on historical backtests. The AFTER MATCH SKIP TO LAST UP2 clause ensured that once a W-pattern completed, the scan resumed at the pattern's end to avoid overlapping detections. The system successfully identified 99.8% of manually validated W-patterns, reducing detection latency from 45 minutes (Python) to 800 milliseconds, enabling real-time algorithmic trading.
What candidates often miss
How does the AFTER MATCH SKIP clause determine the resume point after a match, and why does SKIP TO NEXT ROW versus SKIP PAST LAST ROW matter for overlapping patterns?
AFTER MATCH SKIP dictates where the pattern matcher continues scanning. SKIP PAST LAST ROW (the default) resumes after the final row of the current match, preventing any row from participating in multiple matches—suitable for distinct event detection. Conversely, SKIP TO NEXT ROW resumes at the row immediately following the match's start row, allowing overlapping matches. This is critical in financial time series where a single trough might legitimately form the bottom of two consecutive W-patterns (overlapping windows). Candidates often default to standard skipping, inadvertently filtering out valid overlapping signals and reducing detection sensitivity.
What is the distinction between RUNNING and FINAL semantics in the MEASURES clause, and how does this impact aggregate calculations within variable-length patterns?
RUNNING evaluates an expression at each successive row as the match is being constructed (e.g., calculating a moving average during the decline leg). FINAL evaluates the expression only once at the last row of the complete match, using the final bound values for all pattern variables (e.g., calculating the total percentage change from pattern start to end). Candidates frequently omit the FINAL keyword when computing pattern-wide metrics like MAX(leg_price) - MIN(leg_price), resulting in intermediate values from incomplete matches being returned, which leads to incorrect trade signal calculations.
How do you handle empty matches and ensure unmatched rows appear in output for debugging purposes?
By default, MATCH_RECOGNIZE filters out rows that do not participate in any match. To include unmatched rows (essential for auditing why certain sequences failed pattern criteria), one must specify ALL ROWS PER MATCH combined with SHOW EMPTY MATCHES. In this mode, every input row generates output, with pattern measures returning NULL for rows outside matches. Additionally, MATCH_NUMBER() returns NULL for unmatched rows. Candidates often struggle with "missing data" debugging, unaware that strict DEFINE conditions filtered valid rows, and they fail to utilize SHOW EMPTY MATCHES to diagnose which specific boolean condition (e.g., the second trough not being lower than the first) caused the pattern rejection.