SQL (ANSI)ProgrammingData Engineer

Demonstrate the ANSI SQL idiom for unpivoting multiple attribute columns into normalized key-value rows without utilizing the proprietary UNPIVOT operator or LATERAL derived tables?

Pass interviews with Hintsage AI assistant

Answer to the question.

History of the question.

This requirement originates in data ingestion pipelines migrating from legacy spreadsheet systems or flat-file exports, where time-series metrics or categorical attributes are denormalized into wide-format column headers (e.g., Jan_Sales, Feb_Sales) rather than normalized rows. Such schemas are prevalent in Excel-driven business processes before ETL into relational warehouses, demanding transformation into narrow fact tables to enable temporal analysis and dimensional JOINs. The challenge lies in transposing these static column projections into dynamic tuple streams without resorting to imperative row-by-row processing.

The problem.

Standard SELECT statements fix projected column identities at parse time, preventing a single projection from emitting different source columns on different output rows without LATERAL correlation or procedural iteration. The objective is to manufacture a Cartesian product between each source row and a virtual dimension table enumerating the attribute names, then multiplex the correct source value into a generic result column via conditional logic. This must be accomplished using only standard join syntax and scalar expressions available in ANSI SQL:1999 and later.

The solution.

Utilize a CROSS JOIN against a derived table expressed through the VALUES row constructor, which enumerates the categorical keys (e.g., month names) as rows. Within the SELECT list, employ a searched CASE expression that maps each key to its corresponding source column, effectively projecting the denormalized value into a normalized row structure. Filter the result to exclude NULL values generated when a source attribute is missing for a particular key, ensuring the final output contains only valid measurements.

SELECT s.cost_center_id, m.fiscal_month, CASE m.fiscal_month WHEN 'M01' THEN s.m01_amt WHEN 'M02' THEN s.m02_amt WHEN 'M03' THEN s.m03_amt -- ... additional months WHEN 'M12' THEN s.m12_amt END AS amount FROM budget_wide s CROSS JOIN ( VALUES ('M01'), ('M02'), ('M03'), ('M04'), ('M05'), ('M06'), ('M07'), ('M08'), ('M09'), ('M10'), ('M11'), ('M12') ) AS m(fiscal_month) WHERE CASE m.fiscal_month WHEN 'M01' THEN s.m01_amt -- ... repeat for all to avoid NULLs END IS NOT NULL;

Situation from life

A finance department exported fiscal year budget allocations from a corporate Excel model into a staging table, where twelve monthly columns (M01_Amt through M12_Amt) represented denormalized time periods for each cost center. The target SAP data warehouse required a narrow fact table schema of (CostCenter_ID, Fiscal_Month, Amount), necessitating an unpivot transformation within the ANSI SQL loading script to avoid intermediate Python processing. The volume of fifty million records precluded manual transformation or multiple-pass loading strategies.

Solution 1: Union All per column.

The initial approach utilized twelve separate SELECT queries, each projecting a different month column hardcoded into the generic Amount and Month_Name columns, combined via UNION ALL. Pros: This method enjoys universal compatibility, functioning on legacy mainframe databases and archaic SQL engines that lack modern join syntax. Cons: It performs twelve full table scans on the source data, resulting in linear I/O degradation; the query plan becomes voluminous and difficult to cache, and any schema modification (adding a thirteenth period) requires altering twelve separate projection lists.

Solution 2: Dynamic SQL generation.

An alternative involved constructing the query text dynamically in an application layer by iterating over metadata tables to generate the necessary CASE branches or UNION arms at runtime. Pros: This provides flexibility against evolving schemas and reduces manual SQL authoring toil when handling hundreds of columns. Cons: It violates the prohibition on procedural logic; it introduces SQL injection attack vectors and compilation overhead, and the resulting statement cannot be encapsulated within a static database view or stored procedure definition.

Solution 3: Cross Join with Values.

The accepted implementation employed a CROSS JOIN with a VALUES constructor defining the twelve fiscal periods, multiplexing the correct amount via a CASE expression keyed on the virtual period identifier. Pros: It executes as a single pass over the source table, leverages efficient join algorithms, and is fully declarative and portable across Oracle, SQL Server, PostgreSQL, and Db2 without vendor hints. Cons: It requires SQL:1999 support for row constructors, unavailable on antiquated systems, and the CASE expression verbosity increases maintenance overhead unless generated via templates.

Result.

The transformation latency decreased from twenty-five minutes to under ninety seconds by eliminating the redundant table scans inherent in the UNION ALL pattern. The loading process became resilient to schema extensions, requiring only the addition of a row to the VALUES constructor when new fiscal periods are introduced. Furthermore, the logic was encapsulated in a standard view, enabling direct ad-hoc querying by Tableau users without intermediate ETL steps.

What candidates often miss

How do you prevent NULL values in the source columns from appearing as rows in the unpivoted result without causing the CASE expression to be evaluated twice in the execution plan?

Candidates frequently embed the CASE expression inside a WHERE clause predicate such as WHERE CASE ... END IS NOT NULL, which forces the optimizer to compute the projection twice—once for filtering and once for output. The efficient ANSI SQL pattern materializes the result inside a derived table or Common Table Expression (CTE): SELECT * FROM (SELECT ..., CASE ... END AS val FROM ... CROSS JOIN ...) sub WHERE val IS NOT NULL. This computes the CASE once, filters the rows, and maintains clean separation of concerns for the query optimizer.

When unpivoting columns with heterogeneous data types (e.g., a VARCHAR comment column alongside a DECIMAL amount column), what specific ANSI SQL casting strategy ensures type coherence in the single result value column without data loss?

Many candidates incorrectly rely on implicit type conversion, which may truncate strings or lose decimal precision, or they attempt UNION ALL without realizing type coercion rules vary by platform. The robust solution explicitly casts every source column to a common supertype—typically VARCHAR—within each WHEN branch of the CASE expression: CASE WHEN key='Comment' THEN CAST(text_col AS VARCHAR(500)) ELSE CAST(num_col AS VARCHAR(500)) END. This guarantees that all return values share a single datatype compatible with the result column definition, preserving textual representation of numeric data where necessary.

Why does the CROSS JOIN with VALUES approach superficially appear to create a Cartesian product explosion, and how does the optimizer typically mitigate this compared to the NULL elimination behavior of a native UNPIVOT operator?

The CROSS JOIN logically generates M×N rows (source rows multiplied by attribute count) before filtering, which candidates fear will degrade performance on large datasets. However, modern cost-based optimizers recognize the CASE expression’s data dependency on the small constant table and often transform the plan into a simple projection or UNPIVOT physical operator internally, avoiding actual row multiplication. Unlike native UNPIVOT, which typically eliminates NULL results automatically, this method requires an explicit WHERE clause to discard rows where the source attribute was NULL, otherwise the result set contains spurious empty facts that corrupt aggregate calculations downstream.