The challenge of identifying local extrema emerged from quantitative finance and industrial IoT monitoring, where detecting peaks (local maxima) and troughs (local minima) in sequential data signals critical events such as market reversals or equipment anomalies. Early implementations relied on cursor-based processing or application-layer iterations that created significant latency when analyzing high-volume time-series data. The problem requires comparing each data point against its immediate neighbors to determine if it represents a relative high or low point within its local context.
The core difficulty lies in performing pairwise comparisons between a row and its adjacent neighbors while maintaining the dataset's sort order, a procedural operation that seems to require row-by-row iteration. Without window functions, developers typically resort to self-joins that generate O(n²) complexity or subqueries that trigger repeated table scans, both of which degrade rapidly as dataset size increases. This performance bottleneck creates challenges for real-time analytics pipelines that must process streaming sensor data with minimal latency.
The solution leverages the LEAD and LAG window functions to shift the data perspective, allowing a set-based comparison where a peak is defined as a row where the current value exceeds both the previous and next values. This approach maintains O(n) complexity with a single table scan, handling edge cases at sequence boundaries through explicit NULL management to ensure the first and last rows are treated appropriately.
SELECT reading_time, sensor_value, CASE WHEN sensor_value > LAG(sensor_value) OVER (ORDER BY reading_time) AND sensor_value > LEAD(sensor_value) OVER (ORDER BY reading_time) THEN 'LOCAL_MAXIMUM' WHEN sensor_value < LAG(sensor_value) OVER (ORDER BY reading_time) AND sensor_value < LEAD(sensor_value) OVER (ORDER BY reading_time) THEN 'LOCAL_MINIMUM' ELSE 'NEUTRAL' END AS inflection_type FROM sensor_readings;
A renewable energy firm needed to optimize wind turbine maintenance by detecting abnormal vibration patterns in gearbox sensors, specifically identifying sharp spikes in vibration amplitude that preceded mechanical failures. The engineering team required a database solution that could process millions of hourly readings to flag local vibration peaks exceeding neighboring measurements by a significant margin. The constraint against exporting data to external analytics tools necessitated a pure SQL implementation within their PostgreSQL data warehouse.
The first approach considered involved a self-join where each row was joined to its temporal neighbors using inequality conditions on timestamps. This method offered compatibility with legacy SQL databases lacking window function support, but suffered from O(n²) complexity and produced cartesian products that required expensive deduplication. The resulting query plan indicated full table scans nested within nested loop joins, making it impractical for real-time monitoring of high-frequency sensor data.
A second alternative utilized correlated scalar subqueries to fetch the previous and next values for each row, which provided conceptual simplicity for developers unfamiliar with advanced SQL features. However, this triggered repeated index lookups and table scans for every row, resulting in query times exceeding 15 minutes on the production dataset. This performance profile rendered it unsuitable for operational dashboards requiring sub-second response times.
The chosen solution implemented LEAD and LAG window functions with a ROWS frame specification, enabling the database engine to maintain a sliding window of adjacent values in memory during its single pass through the data. This approach reduced execution time to under three seconds while remaining strictly ANSI SQL compliant for portability across PostgreSQL and Oracle systems. The deterministic performance characteristics made it ideal for integration into real-time monitoring pipelines.
The deployment successfully identified 47 critical vibration peaks across the turbine fleet within the first month, triggering predictive maintenance that prevented catastrophic gearbox failures. This proactive intervention avoided an estimated $2.3 million in emergency repair costs and unplanned downtime. Maintenance crews reported high confidence in the automated alerts due to the zero false-positive rate achieved by the strict local maximum definition.
How do you properly handle boundary conditions (first and last rows) when using LEAD and LAG for extrema detection?
By default, LEAD and LAG return NULL when attempting to access rows beyond the partition boundaries, which would cause standard comparison logic to exclude boundary rows from being flagged as extrema or potentially cause NULL propagation in calculations. Candidates should recognize that the first row has no predecessor and the last row has no successor, requiring explicit handling such as using the three-argument form LAG(value, 1, value) OVER (...) to default to the current value, ensuring boundary comparisons evaluate to false. Alternatively, wrapping comparisons in COALESCE to substitute sentinel values allows precise control over whether boundary points are considered local extrema based on business requirements.
How would you detect "plateaus" or flat peaks where multiple consecutive rows share the same maximum value, rather than single-row spikes?
A naive local maximum check fails for plateaus because interior plateau rows equal rather than exceed their neighbors, requiring logic to identify the plateau boundaries rather than individual rows. The solution involves using ROW_NUMBER or DENSE_RANK to identify contiguous groups of equal values, then comparing the group's value against the groups immediately preceding and succeeding it to determine if the entire plateau constitutes a local maximum. This requires nesting window functions or using a CTE to first identify value groups, then applying LEAD/LAG at the group level to detect when a flat peak exists between lower values.
How can you identify "higher highs" in a sequence, where each new local maximum must exceed the previous local maximum to confirm an upward trend?
This requires maintaining state across the result set to track the maximum value seen so far, which cannot be achieved with simple LEAD/LAG comparisons alone. The solution combines a running maximum window function MAX(CASE WHEN is_local_max THEN value END) OVER (ORDER BY time ROWS UNBOUNDED PRECEDING) to track the highest peak encountered up to each point, then comparing each newly detected local maximum against this running value to filter for progressive highs. This technique demonstrates understanding of how to nest conditional logic within window frames to create recursive-like state tracking without procedural loops.