Why is PostgreSQL crash recovery single-threaded?

WAL replay must apply changes in strict LSN order to maintain consistency. Parallel recovery has been discussed on the PostgreSQL mailing lists for years but hasn't been committed to core yet. Recovery speed scales with single-core performance and storage throughput, not core count.

How Do You Check Progress on PostgreSQL's Crash Recovery Process?

Q: Can you speed up PostgreSQL crash recovery?

Yes. PostgreSQL 14's compactify_tuples optimization made WAL replay approximately 2.4x faster. On PostgreSQL 15+, enabling recovery_prefetch and tuning maintenance_io_concurrency further reduces I/O stalls. Shorter checkpoint_timeout values also reduce the WAL volume to replay, at the cost of more I/O during normal operations.

Your PostgreSQL server just crashed. It's replaying WAL and you're staring at a process list that tells you almost nothing. The phone is ringing. How far along is recovery? When will the database accept connections again?

For 90% of midsize and large companies, one hour of database downtime costs over $300,000 (ITIC, 2024). Every minute you can't answer "how much longer?" is a minute stakeholders assume the worst. PostgreSQL doesn't give you a progress bar, but it gives you five distinct tools to estimate where crash recovery stands and when it'll finish.

This guide covers the practical monitoring signals for PostgreSQL crash recovery in versions 14 through 17: log messages, pg_controldata, process titles, and startup progress logging. For standby/partial recovery with SQL connections, it also includes pg_stat_recovery_prefetch.

Key Takeaways

PostgreSQL 14's compactify_tuples optimization made WAL replay ~2.4x faster, dropping replay of 2.2 GB WAL from 148s to 60.8s (Microsoft/Citus, 2021).

Five monitoring methods exist: log messages, pg_controldata, ps output, and log_startup_progress_interval (PG 15+); pg_stat_recovery_prefetch is for standby/connection-based partial recovery.

You can estimate remaining recovery time by comparing the REDO start LSN against the end-of-WAL LSN using pg_controldata.

Only 20% of enterprises are fully prepared to handle database outages (Cockroach Labs, 2025).

What Happens During PostgreSQL Crash Recovery?

PostgreSQL's crash recovery replays every WAL record generated since the last completed checkpoint (PostgreSQL Docs, 2026). The startup process reads the pg_control file, identifies the REDO point, and sequentially applies WAL records until it reaches the end of the WAL stream. No connections are accepted until replay finishes.

The process follows a fixed sequence:

Detection — PostgreSQL reads pg_control and finds the cluster wasn't shut down cleanly.
REDO start — The startup process locates the last checkpoint's REDO location and begins replaying WAL from that LSN.
WAL replay — Each WAL record is applied sequentially. This is single-threaded and CPU-bound for compute, I/O-bound for page reads.
REDO complete — All WAL has been replayed. The database writes a new checkpoint.
Ready — The server begins accepting connections.

LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 2/9CAC9FA0
LOG: redo done at 3/1B8E4D20 system usage: ...
LOG: last completed transaction was at log time 2026-03-31 14:22:07.451092+00
LOG: database system is ready to accept connections

What catches many DBAs off guard is the silence between "redo starts" and "redo done." On a system with 30 minutes of accumulated WAL, that gap can be 5-15 minutes with zero log output at default settings. You're left guessing unless you know where else to look.

The amount of WAL to replay depends directly on your checkpoint_timeout and max_wal_size settings. A longer checkpoint interval means more WAL accumulates between checkpoints, which means a longer recovery window after a crash. It's a deliberate trade-off: longer intervals reduce I/O during normal operations but increase recovery time.

How Can You Monitor Recovery with pg_controldata?

The pg_controldata utility is your most reliable tool during crash recovery because it works even when PostgreSQL won't accept connections (PostgreSQL Docs, 2026). It reads the pg_control file directly from disk and reports the cluster state, last checkpoint location, and REDO point.

Run it from the command line while recovery is in progress:

$ pg_controldata /var/lib/pgsql/17/data | grep -E "cluster state|checkpoint location|REDO"
Database cluster state:               in crash recovery
Latest checkpoint location:           2/9CAC9FA0
Latest checkpoint's REDO location:    2/9CAC9FA0
Latest checkpoint's REDO WAL file:    000000010000000200000009

The key fields are:

Field	What It Tells You
`Database cluster state`	Confirms recovery is in progress (`in crash recovery`)
`Latest checkpoint location`	The LSN of the last completed checkpoint
`Latest checkpoint's REDO location`	Where WAL replay starts
`Latest checkpoint's REDO WAL file`	The first WAL segment being replayed

Estimating Remaining Recovery Time

You can combine pg_controldata with the WAL directory to estimate progress. Compare the REDO start LSN against the newest WAL file on disk:

# Find the REDO start point
REDO_LSN=$(pg_controldata /var/lib/pgsql/17/data | grep "REDO location" | head -1 | awk '{print $NF}')

# Find the latest WAL file
LATEST_WAL=$(ls -t /var/lib/pgsql/17/data/pg_wal/0000000* | head -1)

# Count WAL files to replay (each file = 16 MB by default)
WAL_COUNT=$(ls /var/lib/pgsql/17/data/pg_wal/0000000* | wc -l)
echo "Approximately $WAL_COUNT WAL segments ($(( WAL_COUNT * 16 )) MB) to replay"

Practical estimate: Default PostgreSQL replays WAL at roughly 3-4 segments per second on modern hardware (PostgreSQL Mailing List, 2020). If you have 200 WAL files queued, expect roughly 50-70 seconds of replay time. This rate varies heavily with storage speed and shared_buffers hit ratio, but it gives you a baseline.

What Does the Startup Process Title Tell You?

PostgreSQL updates the startup process title during recovery to show which WAL file is currently being replayed. This is the simplest real-time progress indicator and works on every PostgreSQL version.

$ ps aux | grep "startup recovering"
postgres  2841  12.3  1.2  ... postgres: startup recovering 000000010000000200000010

The WAL filename follows the format TTTTTTTTSSSSSSSSXXXXXXXX:

TTTTTTTT — Timeline ID
SSSSSSSS — Segment high bits (logical segment group)
XXXXXXXX — Segment low bits (position within group)

Poll this every few seconds to watch recovery advance through the WAL stream:

while true; do
  ps -eo pid,command | grep "startup recovering" | grep -v grep
  sleep 2
done

During a production incident, I've found this method the fastest way to confirm recovery is actually making progress and not stuck. If the WAL filename in the process title isn't changing for 30+ seconds, something's wrong — usually a storage I/O bottleneck or a corrupted WAL segment.

Compare the current WAL file against your total count to get a rough percentage:

# Current segment being replayed
CURRENT=$(ps -eo command | grep "startup recovering" | awk '{print $NF}')

# Total segments in pg_wal
TOTAL=$(ls /var/lib/pgsql/17/data/pg_wal/0000000* 2>/dev/null | wc -l)

echo "Currently replaying: $CURRENT"
echo "Total WAL segments: $TOTAL"

How Does log_startup_progress_interval Work? (PostgreSQL 15+)

PostgreSQL 15 introduced log_startup_progress_interval, a parameter that automatically logs messages about long-running startup operations at configurable intervals (PostgreSQL Docs, 2022). This eliminates the silence gap between "redo starts" and "redo done."

The default is 10 seconds. Set it lower for more granular progress during recovery:

-- In postgresql.conf (apply via SIGHUP / config reload; no full restart required)
log_startup_progress_interval = 5s

During crash recovery, you'll see log entries like:

LOG:  recovery in progress, elapsed time: 5.02 s, current WAL location: 2/A1234560
LOG:  recovery in progress, elapsed time: 10.04 s, current WAL location: 2/B5678900
LOG:  recovery in progress, elapsed time: 15.06 s, current WAL location: 2/C9ABCDE0

This gives you two things you didn't have before: a timestamp cadence showing recovery is alive, and an advancing LSN you can use to calculate replay speed in bytes per second.

# Calculate replay speed from two log entries
# LSN 2/B5678900 - 2/A1234560 = ~0x14443A0 = ~21 MB in 5 seconds ≈ 4.2 MB/s

The parameter also covers other long-running startup operations beyond WAL replay:

Data directory fsync — Syncing all files after an unclean shutdown
Unlogged relation reset — Truncating unlogged tables (they lose data on crash)

Are these intervals giving you enough visibility, or do you need deeper I/O-level insight? That's where recovery prefetch comes in.

Source: PostgreSQL Release Notes, 2021–2024

What Is pg_stat_recovery_prefetch and How Do You Use It?

PostgreSQL 15 added the recovery_prefetch parameter and the pg_stat_recovery_prefetch view to optimize and monitor I/O during recovery (Citus Data, 2022). Instead of reading each page synchronously during WAL replay, PostgreSQL looks ahead in the WAL stream and issues asynchronous prefetch requests via posix_fadvise. This reduces I/O stalls and speeds up recovery on spinning disks substantially.

The recovery_prefetch GUC has three values:

Value	Behavior
`off`	No prefetching. Sequential page reads during replay.
`try` (default)	Enable prefetch if the OS supports `posix_fadvise`.
`on`	Enable prefetch. Error if `posix_fadvise` unavailable.

Reading the pg_stat_recovery_prefetch View

Connect to the database (on a standby, or after partial recovery if hot_standby = on) and query:

SELECT * FROM pg_stat_recovery_prefetch;

Column	Type	What It Means
`prefetch`	bigint	Blocks prefetched because they weren't in the buffer pool
`hit`	bigint	Blocks skipped because they were already in shared buffers
`skip_init`	bigint	Blocks skipped because they'd be zero-filled anyway
`skip_new`	bigint	Blocks skipped because they didn't exist yet
`skip_fpw`	bigint	Blocks skipped because the WAL record contains a full page image
`skip_rep`	bigint	Blocks skipped because they were already recently prefetched
`wal_distance`	int	How far ahead the prefetcher is looking (bytes)
`block_distance`	int	How many blocks ahead the prefetcher is looking
`io_depth`	int	Number of initiated but incomplete prefetches right now

The cumulative counters (prefetch, hit, skip_*) grow over time. The three real-time gauges (wal_distance, block_distance, io_depth) show the prefetcher's current state.

What to watch for: A high hit ratio means most pages were already in shared buffers — recovery is CPU-bound, not I/O-bound. A high prefetch count with low io_depth means the prefetcher is keeping up with replay demand. If io_depth consistently equals maintenance_io_concurrency (default 10), your storage can't keep up and you may benefit from increasing that value or using faster disks.

Tuning Recovery Prefetch

Two parameters control prefetch behavior:

-- How many concurrent I/O operations to issue during recovery
maintenance_io_concurrency = 10   -- default; increase for fast SSDs (up to 1000)

-- How far ahead the prefetcher can look in the WAL stream
wal_decode_buffer_size = 524288   -- default 512 KB; increase for large recovery workloads

On NVMe storage, bumping maintenance_io_concurrency to 50-100 can measurably reduce recovery time by keeping the I/O pipeline full.

How Do Checkpoint Settings Affect Recovery Time?

Checkpoint configuration is where you make the trade-off between normal operation performance and crash recovery duration. A production PostgreSQL system commonly uses checkpoint_timeout between 30 minutes and 1 hour, with max_wal_size set to 10-30 GB (EDB, 2023). The default 5-minute timeout is too aggressive for most workloads, but longer intervals mean more WAL to replay.

Estimates based on ~50 MB/s WAL generation with modern SSD storage. Source: EDB, 2023

Here's the configuration you should have in place before a crash happens:

-- postgresql.conf — checkpoint and recovery monitoring settings
checkpoint_timeout = '30min'              -- balance between I/O and recovery time
max_wal_size = '10GB'                     -- allow enough WAL before forced checkpoint
log_checkpoints = on                      -- log every checkpoint with timing details
log_startup_progress_interval = '5s'      -- PG 15+: progress updates during recovery
recovery_prefetch = try                   -- PG 15+: async prefetch during replay
maintenance_io_concurrency = 20           -- PG 15+: concurrent prefetch I/Os
log_recovery_conflict_waits = on          -- PG 14+: log long waits during standby recovery

Our finding: Enabling log_checkpoints is the single most impactful monitoring change you can make before a crash happens. It logs the checkpoint's REDO location, the number of buffers written, and the time taken — data you'll need to estimate recovery scope after a crash.

Enterprises experience an average of 86 outages per year, with an average outage length of 196 minutes (Cockroach Labs/Wakefield Research, 2025). Having these monitoring parameters configured in advance is the difference between "recovery is progressing, ETA 90 seconds" and "we don't know."

What About Monitoring Recovery on Standby Servers?

Standby servers continuously replay WAL shipped from the primary. The same monitoring tools apply, but you also get SQL-level visibility because the standby accepts read-only connections when hot_standby = on.

Query the standby directly to check replay progress:

-- Current replay position
SELECT pg_last_wal_replay_lsn(), pg_last_wal_receive_lsn();

-- Replay lag in bytes
SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS replay_lag_bytes;

-- Last transaction replay timestamp
SELECT pg_last_xact_replay_timestamp();

On a standby running PostgreSQL 15+, combine this with the prefetch stats:

SELECT
    pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS replay_lag_bytes,
    prefetch,
    hit,
    round(hit::numeric / nullif(prefetch + hit, 0) * 100, 1) AS buffer_hit_pct,
    io_depth,
    wal_distance
FROM pg_stat_recovery_prefetch;

A buffer_hit_pct above 90% means recovery is largely CPU-bound. Below 50% signals heavy disk I/O — that's where increasing maintenance_io_concurrency helps most.

PostgreSQL 14 also introduced log_recovery_conflict_waits, which logs when the startup process waits longer than deadlock_timeout for recovery conflicts on the standby (bdrouvot, 2021). This is invaluable for diagnosing why replay stalls:

LOG: recovery still waiting after 1024.777 ms: recovery conflict on snapshot
DETAIL: Conflicting process: 12345

Quick Reference: All Recovery Monitoring Methods

Method	PostgreSQL Version	Works During Crash Recovery?	Requires Connection?
Log messages ("redo starts/done")	All	Yes	No — check `pg_log`
`pg_controldata`	All	Yes	No — reads `pg_control` file
`ps` process title	All	Yes	No — OS-level
`log_startup_progress_interval`	15+	Yes	No — check `pg_log`
`pg_stat_recovery_prefetch`	15+	Standby only	Yes — SQL query
`pg_last_wal_replay_lsn()`	10+	Standby only	Yes — SQL query
`log_recovery_conflict_waits`	14+	Standby only	No — check `pg_log`

The first four work during actual crash recovery (no connections available). The last three require a running standby with hot_standby = on.

Frequently Asked Questions

How long does PostgreSQL crash recovery typically take?

Most instances complete crash recovery in seconds to a few minutes, even with a 1-hour checkpoint interval (EDB, 2023). Recovery time depends on the volume of WAL accumulated since the last checkpoint and your storage throughput. A system generating 50 MB/s of WAL with a 30-minute checkpoint interval replays roughly 7 GB of WAL in under 2 minutes on SSD storage.

Can you speed up crash recovery?

Yes. PostgreSQL 14's compactify_tuples optimization made WAL replay approximately 2.4x faster — dropping replay of 2.2 GB WAL from 148 seconds to 60.8 seconds (Microsoft/Citus, 2021). On PostgreSQL 15+, enabling recovery_prefetch and tuning maintenance_io_concurrency further reduces I/O stalls. Shorter checkpoint_timeout values also reduce the WAL volume to replay, at the cost of more I/O during normal operations.

Does crash recovery happen automatically?

Yes, crash recovery is fully automatic. PostgreSQL detects an unclean shutdown by reading the pg_control file and starts WAL replay without any manual intervention. The database won't accept connections until recovery completes. No DBA action is needed to initiate it — your role is monitoring progress and optimizing the parameters that affect recovery speed.

What's the difference between crash recovery and PITR?

Crash recovery replays WAL from the last checkpoint to the end of locally available WAL. Point-in-time recovery (PITR) restores from a base backup and replays archived WAL to a specific target time, LSN, or transaction ID. Crash recovery is automatic and fast; PITR is manual and can take much longer depending on the volume of archived WAL. Both use the same WAL replay mechanism internally.

Why is crash recovery single-threaded?

WAL replay must apply changes in strict LSN order to maintain consistency. Parallel recovery has been discussed on the PostgreSQL mailing lists for years and a community wiki page tracks the design (PostgreSQL Wiki), but it hasn't been committed to core yet. The single-threaded design means recovery speed scales with single-core performance and storage throughput, not core count.

Conclusion

PostgreSQL crash recovery doesn't come with a progress bar, but it doesn't have to be a black box either. Between pg_controldata, process title monitoring, log-based progress intervals, and the prefetch statistics view, you have five distinct methods to track exactly where recovery stands.

The key takeaways:

Before a crash: Set log_checkpoints = on, log_startup_progress_interval = 5s, and tune your checkpoint interval for your acceptable recovery window.
During crash recovery: Use pg_controldata and ps to monitor progress when no connections are available.
On standbys: Query pg_stat_recovery_prefetch and pg_last_wal_replay_lsn() for real-time I/O and replay metrics.
After recovery: Review checkpoint logs and recovery prefetch stats to tune parameters for the next incident.

55.6% of developers now use PostgreSQL — the most popular database for the third consecutive year (Stack Overflow Developer Survey, 2025). Understanding its recovery internals isn't optional for production DBAs. It's the difference between confidently reporting "recovery will complete in 45 seconds" and silently hoping.

Mughees Ahmed — ReliaDB

ReliaDB is a specialist DBA team for PostgreSQL and MySQL performance, high availability, and cloud database optimization. More about ReliaDB →