OOB in wakeupWaiters() chunking loop (WAIT FOR)
We found an array access OOB in src/backend/access/transam/xlogwait.c in wakeupWaiters(). The variable i is initialized from lsnType, then reused as the wakeup loop counter. If a full wakeup batch is processed, i remains at 16 and is used again as an index into waitLSNState->waitersHeap[i] on the next do { ... } while (...) iteration, causing OOB access (WAIT_LSN_TYPE_COUNT == 4).
We knew that:
- Any user who can connect opens 16+ sessions and runs
WAIT FOR LSN '<future>' WITH (mode 'primary_flush'). Then they (or normal workload) generate WAL (e.g.,INSERTs). When WAL writer flushes,WaitLSNWakeup()hits the OOB and a backend crashes - Read‑only users can open 16+ sessions running
WAIT FOR LSN '<future>' WITH (mode 'standby_replay'/'standby_flush'). WAL replay/receiver activity wakes them and triggers the same OOB, crashing the standby
wakeupWaiters() uses a single local variable i both as the heap index derived from lsnType (waitersHeap[i]), and later the loop counter for waking processes (for (i = 0; ... )).
If exactly 16 waiters are collected in one pass (WAKEUP_PROC_STATIC_ARRAY_SIZE), the function repeats its do {...} while (...) loop with i == 16, and then indexes waitersHeap[16] even though the array is only 4 elements long. That’s an out-of-bounds access into the shared-memory WaitLSNState object. (see src/include/access/xlogwait.h:36)
And WaitLSNState stores waitersHeap[WAIT_LSN_TYPE_COUNT] immediately before the flexible procInfos[] array (same shared-memory allocation), (see src/include/access/xlogwait.h:78)
In which any index >= 4 is an OOB access into adjacent shared-memory fields (into procInfos[]).
For why wakeupWaiters() reuses i and can re-enter with i == 1, In wakeupWaiters() (see src/backend/access/transam/xlogwait.c:242 - src/backend/access/transam/xlogwait.c:310), you can see that i is initialized from lsnType once, before the do {} loop, then reused as the for loop counter, the do {} loop repeats when numWakeUpProcs == 16, at the end of the for, i == numWakeUpProcs, so i becomes 16 precisely when the repeat condition is true.
e.g.,
- Start:
i = (int)lsnType (0..3), OK. - Suppose at least 16 eligible waiters exist, loop fills
wakeUpProcs[]and setsnumWakeUpProcs= 16 - The
for (i = 0; i < numWakeUpProcs; i++) ends with i == 16. - Next iteration: while (
!pairingheap_is_empty(&waitersHeap[i])) becomeswaitersHeap[16](OOB)
note that the initial Assert(i >= 0 && i < WAIT_LSN_TYPE_COUNT); does not re-run for subsequent do {} iterations, so it doesn’t catch the corrupted i.
This is dangerous, since a pairingheap is a small struct of pointers, and pairingheap_is_empty(h) reads h->ph_root: (see src/include/lib/pairingheap.h:71, "typedef struct pairingheap");
So when the code treats bytes in/near procInfos[] as pairingheap, it will read a bogus ph_root pointer (from unrelated data); then pass the bogus “heap” into pairingheap_first() / pairingheap_remove_first(); then convert the returned bogus node pointer into a WaitLSNProcInfo * via pairingheap_container(...).
For reproduction:
- We started a primary server on upstream master
- Get current LSN; choose a target slightly ahead (e.g., +0x400000).
- Open 16+ sessions, we did this via
WAIT FOR LSN '<target>' WITH(mode 'primary_flush'); - Generate WAL (e.g.,
pgbench -n -c 8 -T 10) - Backend crashes; postmaster restarts
Using CFLAGS=-O1 -g -fsanitize=address,undefined -fno-omit-frame-pointer -fno-sanitize-recover=undefined and UBSAN_OPTIONS=print_stacktrace=1, we get:
.../src/backend/access/transam/xlogwait.c:274:11: runtime error: index 16 out of bounds for type 'pairingheap [4]'#0 wakeupWaiters ... xlogwait.c:274#1 WaitLSNWakeup ... xlogwait.c:330#2 XLogFlush ... xlog.c:2927#3 RecordTransactionCommit ... xact.c:1515

Disclose Timeline:
The issue does not effect stable branches
- Jan 5, 5:14 PM ET: Disclosed via security@postgresql.org
- Jan 5, 11:00 PM ET: Accepted
- Jan 5, 11:04 PM ET: Forwarded issue to code owner
- Jan 6, 03:03 AM ET: Patched upstream.