Articlepostgresobservabilityoperations

The Modern Postgres Observability Stack in 2026

The metrics that actually matter, the tools that work in 2026, and the alerts to set up before your database becomes someone else's problem.

Published May 11, 2026· Updated May 14, 2026 11 min read

Postgres observability is a solved problem in 2026, but most teams ship with the default RDS / Supabase dashboards and call it done. That's fine until something goes wrong; then you need the layer underneath. Here's what that layer looks like and the minimum viable stack.

What you actually need

Three categories:

Query-level: which queries are slow, why, how often.
Connection-level: how many connections, what they're doing, who's waiting.
System-level: CPU, memory, IO, disk, replication lag.

The provider dashboards give you the third one for free. The first two are where you live during incidents.

Core extensions

pg_stat_statements: enabled by default on most managed Postgres. Records each query's mean / total time and calls. The single most useful Postgres tool ever shipped.
auto_explain: logs the EXPLAIN plan for any query slower than a threshold. Set the threshold to your SLO; alerts arrive with the plan attached.
pg_stat_kcache: per-query OS-level stats (CPU time, IO bytes). Optional but pairs well with pg_stat_statements.

enable.sqlsql

-- pg_stat_statements: enable in postgresql.conf and reload
-- (on Supabase, it's already on)
shared_preload_libraries = 'pg_stat_statements,auto_explain'

-- Then in the database:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- auto_explain config
auto_explain.log_min_duration = '200ms'
auto_explain.log_analyze = on
auto_explain.log_buffers = on

The metrics to watch

For each category, a short list of the metrics that actually predict trouble:

Query-level

Top 10 queries by total time from pg_stat_statements. Review weekly.
Mean execution time over the last hour vs the same hour last week. Anomalies are usually new bad queries.
Query call count anomalies. A query that suddenly fires 100x more often is usually a missing cache or a polling loop.

Connection-level

Active connection count vs max_connections. Past 80% is danger.
Idle-in-transaction connections. Any over a few minutes is a bug.
Lock waits via pg_locks joined to pg_stat_activity.

System-level

CPU utilisation. Sustained over 70% means you need more compute or a query fix.
Disk space free. Postgres degrades poorly when disk fills.
Replication lag (if you have replicas). Past a few seconds is a sign of WAL backpressure.
Cache hit ratio (blks_hit / blks_hit + blks_read). Should be over 95% for hot data.

Tooling that ships

Provider dashboards

Supabase and Neon both ship per-project query and connection dashboards. Good for spot-checking; not the place to set up alerts.

Grafana + a Postgres exporter

The reference open-source stack. postgres_exporter scrapes pg_stat views; Prometheus stores; Grafana renders. Plenty of pre-built dashboards exist. Self-hosted or hosted Grafana Cloud.

pganalyze, Crunchy Insights, Datadog Database Monitoring

Specialised products. pganalyze in particular is genuinely good at surfacing the "these queries got slower this week" narrative without you having to set up dashboards yourself. Worth the cost for teams past a certain scale.

An admin tool with audit + history

Operational observability isn't just metrics. When something looks weird in production, the question is often "who or what changed this row last week?" A tool with a row-level history panel beats grepping logs.

Alerts to set up

Minimal alert set. Each one has a specific action:

Disk space < 20%: free space or grow it.
Active connections > 80% of max: investigate pooling.
Replication lag > 10 seconds: check WAL throughput.
Idle-in-transaction connection > 10 minutes: someone left a tab open; kill it.
p95 query latency on a critical query > threshold: regression.
Autovacuum lag (dead_tup growing): tune per-table settings.
Backup completion failed: highest priority. Investigate.

Suparbase is an admin workspace for Supabase. Encrypted credentials, server-side proxy, RLS debugger, SQL playground, AI assistant with diff-confirmed writes. Free tier for solo projects.

Try Suparbase free See all features

The Modern Postgres Observability Stack in 2026

What you actually need

Core extensions

The metrics to watch

Query-level

Connection-level

System-level

Tooling that ships

Provider dashboards

Grafana + a Postgres exporter

pganalyze, Crunchy Insights, Datadog Database Monitoring

An admin tool with audit + history

Alerts to set up

Related articles

Reading Postgres EXPLAIN ANALYZE: The 2026 Guide

MVCC in Postgres: When It Bites You

Database Backups That Actually Work in 2026

What you actually need#

Core extensions#

The metrics to watch#

Query-level

Connection-level

System-level

Tooling that ships#

Provider dashboards

Grafana + a Postgres exporter

pganalyze, Crunchy Insights, Datadog Database Monitoring

An admin tool with audit + history

Alerts to set up#

Related articles

Reading Postgres EXPLAIN ANALYZE: The 2026 Guide

MVCC in Postgres: When It Bites You

Database Backups That Actually Work in 2026

What you actually need

Core extensions

The metrics to watch

Tooling that ships

Alerts to set up