SQLProgrammingSenior Database Engineer (PostgreSQL)

Which architectural constraint prevents **PostgreSQL**'s query planner from pruning partitions when the partition key is filtered by a **STABLE** function, despite the function returning a constant within the transaction?

Pass interviews with Hintsage AI assistant

Answer to the question.

PostgreSQL introduced declarative partitioning in version 10 to replace the cumbersome inheritance-based partitioning model. The query planner performs static partition pruning during the planning phase by comparing query predicates against partition boundaries, but it requires expressions to be evaluable to constants at plan time to determine which partitions can be eliminated.

The core architectural constraint is that STABLE functions, such as now() or current_timestamp, are not evaluated during planning because their results may differ between planning and execution or even during query execution. Consequently, the planner treats predicates involving these functions as black boxes, unable to prove that certain partitions cannot contain matching rows, which forces a scan of all partitions.

The solution involves either rewriting the predicate to use IMMUTABLE functions or literal constants, or relying on runtime partition pruning available in PostgreSQL 11 and later. By setting enable_partition_pruning to on, the executor evaluates STABLE function results against partition bounds at execution time, dynamically skipping irrelevant partitions after the initial planning phase.

Situation from life

A financial analytics firm partitioned a trades table by TIMESTAMPTZ on the execution_time column using daily ranges to manage terabytes of tick data. Analysts frequently queried recent activity with WHERE execution_time >= now() - interval '1 hour', but observed that these queries suffered catastrophic performance degradation, scanning all 365 daily partitions instead of just the latest one.

The first approach considered was to have the application layer calculate the timestamp boundary and inject it as a literal constant. This enabled immediate static pruning and reduced query time from 45 seconds to 80 milliseconds. However, it broke existing SQL queries embedded in third-party BI tools that could not be easily modified.

The second approach involved creating a custom immutable function that returned a fixed timestamp based on the current date. This was rejected because it would produce incorrect results if the database transaction remained open across the midnight boundary, violating the STABLE contract that now() provides within a transaction. Such a violation could lead to silently incorrect query results if the planner cached an outdated partition range.

The chosen solution leveraged PostgreSQL 12's runtime partition pruning capabilities. The database administrators ensured enable_partition_pruning was enabled and refactored the application to use prepared statements with the time boundary passed as a parameter rather than concatenated into the SQL string. This allowed the executor to prune partitions dynamically using the actual parameter value at execution time, achieving near-optimal performance without requiring changes to the SQL text generation in legacy reporting tools.

What candidates often miss

How does PostgreSQL's constraint exclusion mechanism for inherited tables differ from native partition pruning, and why does the former require explicit GUC configuration while the latter does not?

Constraint exclusion was the partitioning method used before declarative partitioning, relying on CHECK constraints on child tables to prove that tables could not contain relevant rows. Because evaluating these constraints against every inherited table during planning is expensive when hundreds of tables exist, it is controlled by the constraint_exclusion parameter, which defaults to partition (only checking when querying via inheritance). Native partition pruning uses specialized data structures in the planner that understand the partition hierarchy directly, making it faster and always enabled, requiring no GUC adjustment for correct functionality.

When updating a row to move it between partitions by modifying the partition key, why does PostgreSQL internally execute a DELETE and INSERT rather than an in-place update, and what trigger implications does this have?

Because each partition is a distinct heap relation with separate physical storage, changing the partition key requires moving the tuple from one file to another. PostgreSQL implements this transition by deleting the row from the source partition and inserting it into the destination partition. This means that row-level BEFORE DELETE, AFTER DELETE, BEFORE INSERT, and AFTER INSERT triggers all fire during what appears to be a single UPDATE operation. Additionally, logical replication streams this as two separate WAL entries (delete and insert), which can cause conflicts on subscribers if the replica identity is not properly configured.

What specific locking and validation overhead occurs when attaching a new partition to a table that has a DEFAULT partition, and how can one avoid the full scan of the default partition?

When attaching a new partition to a range or list partitioned table containing a DEFAULT partition, PostgreSQL must scan the entire DEFAULT partition to verify that no rows exist that should belong to the new, more specific partition. This validation scan acquires an ACCESS EXCLUSIVE lock on the partitioned table and can take hours for large default partitions. To avoid this, one should detach the DEFAULT partition before attaching the new partition, then re-attach the DEFAULT partition only after ensuring no conflicting rows exist, or alternatively, use CREATE TABLE ... PARTITION OF to create a new empty partition and migrate data using INSERT ... SELECT with a WHERE clause that filters for the specific partition range, bypassing the validation scan of the default partition's contents.