TOAST was introduced in PostgreSQL to handle row data exceeding the 8KB page size by compressing large columns into separate physical storage. When logical replication streams changes via the WAL, the REPLICA IDENTITY setting determines which old tuple values are included. The default REPLICA IDENTITY DEFAULT sends only the primary key, while REPLICA IDENTITY FULL sends the complete old row image.
When a table contains JSONB or TEXT columns that exceed ~2KB and are compressed into TOAST, UPDATE operations that modify only non-TOAST columns may not fetch the external TOAST values for the WAL record. The logical decoding process skips unchanged TOAST pointers to reduce I/O, causing the subscriber to receive NULL or missing values for these large fields during conflict resolution.
Switching to REPLICA IDENTITY FULL forces PostgreSQL to include the complete old tuple in the WAL record, explicitly fetching all TOASTed values from external storage during commit. While this guarantees data completeness for UPSERT operations, it increases WAL volume significantly—often by 300-500% for wide JSONB tables—because every UPDATE must log the full pre-image of the row.
A financial trading platform needed to replicate order book snapshots from a primary PostgreSQL 15 cluster to a data warehouse for regulatory reporting. The market_data table stored instrument identifiers and large JSONB payloads (10-50KB) containing depth-of-book information. The replication used pglogical with REPLICA IDENTITY DEFAULT (primary key only). The ETL process on the warehouse side attempted to perform UPSERT operations to maintain a slowly-changing dimension table, requiring the old JSONB values to calculate delta changes for the audit log.
During high-volume trading periods, when the order book updated frequently but the JSONB payload remained unchanged, the logical replication stream sent UPDATE records containing only the primary key and the new tuple data. The TOASTed JSONB old values were not included in the change set because the UPDATE statement only touched the updated_at timestamp column. The ETL process could not access the pre-update JSONB state, making it impossible to calculate precise price movement deltas for the audit trail, violating MiFID II compliance requirements.
Solution 1: Switching to REPLICA IDENTITY FULL This approach would force the publisher to write the complete old row image to the WAL for every UPDATE, including the full JSONB content from TOAST storage. The pros included guaranteed data completeness and simple implementation requiring no schema changes. However, the cons were significant: WAL generation would increase by approximately 400% given the 50KB payloads, risking disk space exhaustion on the primary and increased network latency to the warehouse. For a table processing 10,000 updates per second, this was deemed too risky for production stability.
Solution 2: Application-level journaling with a separate history table
The team considered creating a trigger on the primary that copied the old JSONB values into a separate market_data_history table before the update. The pros were that logical replication could replicate this history table separately, avoiding the TOAST omission issue on the main table while keeping the main table's WAL footprint small. The cons included double-write overhead on the primary (increasing transaction latency), additional storage requirements growing at 2x rate, and complexity in the ETL logic to correlate history records with main table changes using transaction IDs and timestamps.
Solution 3: Using REPLICA IDENTITY with a covering index including a hash of the JSONB
This strategy involved creating a functional index on md5(jsonb_column::text) and including that hash in a composite REPLICA IDENTITY index. The pros were that changes to the JSONB content would be detectable via the hash change in the WAL without sending the full payload. The cons included the inability to retrieve the actual old value (only its hash), which was insufficient for the regulatory requirement to show the exact pre-update state, and the index maintenance overhead on high-churn tables.
The team selected Solution 2 (Application-level journaling) but with a modification. They utilized PostgreSQL's JSONB partial update optimization available in version 14+ and implemented a BEFORE UPDATE trigger that stored only the changed paths (diff) rather than the full old row. This reduced the history table growth while ensuring all necessary pre-image data was available. They kept REPLICA IDENTITY DEFAULT on the main table to avoid WAL bloat, directing the ETL to join against the history table for audit reconstruction.
The replication stream size remained stable, preventing primary storage pressure. The ETL process successfully reconstructed complete audit trails by merging the current row state with the stored diffs from the history table. Regulatory compliance was achieved with only a 15% increase in primary storage (versus 400% for REPLICA IDENTITY FULL) and minimal impact on transaction throughput.
Why does PostgreSQL's logical decoding omit TOAST values even when the column is modified?
Many candidates assume that any UPDATE automatically fetches all TOASTed values for the WAL. However, PostgreSQL only performs "tuple unTOASTing" when the executor actually reads the datum to modify it. If an UPDATE modifies a different column (e.g., SET updated_at = NOW()) without referencing the JSONB column in its target list or WHERE clause, the TOAST pointer remains unchanged and the external storage is not accessed. The WAL record therefore contains only the on-disk tuple with its pointer, not the actual data. Since logical decoding reconstructs tuples from WAL without accessing heap or TOAST tables for old versions, the omitted value appears as NULL in the change stream.
How does REPLICA IDENTITY FULL interact with HOT (Heap-Only Tuple) updates?
Candidates often miss that REPLICA IDENTITY FULL disables HOT updates for a table. HOT updates allow PostgreSQL to chain row versions within the same data page without updating every index entry, provided no indexed column changes. When REPLICA IDENTITY FULL is active, every UPDATE must log the entire old row image for replication, which requires the system to identify the row uniquely by its full content. This breaks the HOT optimization because logical replication needs complete tuple comparison data, forcing index updates for every row version even when modifying non-indexed columns. Consequently, tables with this setting experience higher index bloat and increased I/O, a critical trade-off for high-churn tables.
What is the difference between TOAST compression and PostgreSQL's WAL compression, and how do they interact during logical replication?
This question separates deep systems knowledge from surface-level understanding. TOAST compression reduces row size using LZ4 or PGLZ before storing large columns in external tables. WAL compression (enabled via wal_compression=lz4) compresses full page images written to the WAL for crash recovery efficiency. However, when REPLICA IDENTITY FULL is used, the old tuple data sent to logical decoding is extracted before the WAL record is compressed for storage. Therefore, the logical decoder receives uncompressed TOAST data (if fetched), while the physical WAL file might store it compressed if it's part of a full-page image, affecting network bandwidth versus disk I/O differently.