Skip to content
All articles
Articlepostgresobservabilityoperations

The Modern Postgres Observability Stack in 2026

The metrics that actually matter, the tools that work in 2026, and the alerts to set up before your database becomes someone else's problem.

11 min read

Postgres observability is a solved problem in 2026, but most teams ship with the default RDS / Supabase dashboards and call it done. That's fine until something goes wrong; then you need the layer underneath. Here's what that layer looks like and the minimum viable stack.

What you actually need

Three categories:

  1. Query-level: which queries are slow, why, how often.
  2. Connection-level: how many connections, what they're doing, who's waiting.
  3. System-level: CPU, memory, IO, disk, replication lag.

The provider dashboards give you the third one for free. The first two are where you live during incidents.

Core extensions

  • pg_stat_statements: enabled by default on most managed Postgres. Records each query's mean / total time and calls. The single most useful Postgres tool ever shipped.
  • auto_explain: logs the EXPLAIN plan for any query slower than a threshold. Set the threshold to your SLO; alerts arrive with the plan attached.
  • pg_stat_kcache: per-query OS-level stats (CPU time, IO bytes). Optional but pairs well with pg_stat_statements.
enable.sqlsql
-- pg_stat_statements: enable in postgresql.conf and reload
-- (on Supabase, it's already on)
shared_preload_libraries = 'pg_stat_statements,auto_explain'

-- Then in the database:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- auto_explain config
auto_explain.log_min_duration = '200ms'
auto_explain.log_analyze = on
auto_explain.log_buffers = on

The metrics to watch

For each category, a short list of the metrics that actually predict trouble:

Query-level

  • Top 10 queries by total time from pg_stat_statements. Review weekly.
  • Mean execution time over the last hour vs the same hour last week. Anomalies are usually new bad queries.
  • Query call count anomalies. A query that suddenly fires 100x more often is usually a missing cache or a polling loop.

Connection-level

  • Active connection count vs max_connections. Past 80% is danger.
  • Idle-in-transaction connections. Any over a few minutes is a bug.
  • Lock waits via pg_locks joined to pg_stat_activity.

System-level

  • CPU utilisation. Sustained over 70% means you need more compute or a query fix.
  • Disk space free. Postgres degrades poorly when disk fills.
  • Replication lag (if you have replicas). Past a few seconds is a sign of WAL backpressure.
  • Cache hit ratio (blks_hit / blks_hit + blks_read). Should be over 95% for hot data.

Tooling that ships

Provider dashboards

Supabase and Neon both ship per-project query and connection dashboards. Good for spot-checking; not the place to set up alerts.

Grafana + a Postgres exporter

The reference open-source stack. postgres_exporter scrapes pg_stat views; Prometheus stores; Grafana renders. Plenty of pre-built dashboards exist. Self-hosted or hosted Grafana Cloud.

pganalyze, Crunchy Insights, Datadog Database Monitoring

Specialised products. pganalyze in particular is genuinely good at surfacing the "these queries got slower this week" narrative without you having to set up dashboards yourself. Worth the cost for teams past a certain scale.

An admin tool with audit + history

Operational observability isn't just metrics. When something looks weird in production, the question is often "who or what changed this row last week?" A tool with a row-level history panel beats grepping logs.

Alerts to set up

Minimal alert set. Each one has a specific action:

  1. Disk space < 20%: free space or grow it.
  2. Active connections > 80% of max: investigate pooling.
  3. Replication lag > 10 seconds: check WAL throughput.
  4. Idle-in-transaction connection > 10 minutes: someone left a tab open; kill it.
  5. p95 query latency on a critical query > threshold: regression.
  6. Autovacuum lag (dead_tup growing): tune per-table settings.
  7. Backup completion failed: highest priority. Investigate.

Suparbase is an admin workspace for Supabase. Encrypted credentials, server-side proxy, RLS debugger, SQL playground, AI assistant with diff-confirmed writes. Free tier for solo projects.

Related articles