Skip to content
All articles
Articlepostgresmigrationsoperations

Zero-Downtime Postgres Migrations: Patterns That Actually Work

Add columns, drop columns, rename, change types, and re-shape big tables without a maintenance window. Production-tested Postgres migration patterns for 2026, with the locks each one takes.

16 min read

The migration that took out an afternoon of customer support was rarely a complicated one. It's usually an ALTER TABLE ... ALTER COLUMN ... TYPE that locked a busy table for two minutes, or a DROP COLUMN that broke a stale client. The pattern is always the same: a deploy that did the "right" thing in development hit production reality and stalled.

Postgres has the tools to do every shape of schema change online. They require a specific kind of discipline. This is the playbook.

Three rules that prevent 99% of outages

  1. Never block a write longer than a single statement's execution. If your migration holds ACCESS EXCLUSIVE on a hot table while it scans rows, your application stalls.
  2. Deploy schema changes and application changes in separate releases. The schema must always work with both the previous version and the next version of the application.
  3. Backfills are their own deploy. Don't put a backfill in the same step as a schema change. They have different failure modes.

Everything below is a specialisation of these three rules.

Adding a column

The easiest case, and the one teams still get wrong because of the default-value trap.

Safe: NULL default, no validation

add-column-safe.sqlsql
-- Postgres 11+: this is instant. Just metadata.
ALTER TABLE orders ADD COLUMN tax_rate numeric;

Also safe (Postgres 11+): non-volatile default

add-column-default.sqlsql
-- Non-volatile default. Postgres stores the default in metadata; existing
-- rows return it virtually. No table rewrite.
ALTER TABLE orders ADD COLUMN currency text NOT NULL DEFAULT 'USD';

Unsafe: adding NOT NULL without a default

add-column-unsafe.sqlsql
-- This blocks. Every existing row must have a value before the constraint
-- can be enforced, and Postgres takes ACCESS EXCLUSIVE while it validates.
ALTER TABLE orders ADD COLUMN region text NOT NULL;

The safe version is a three-step deploy:

add-column-three-step.sqlsql
-- Step 1 (deploy A). Nullable column. Instant.
ALTER TABLE orders ADD COLUMN region text;

-- Step 2 (background backfill). Update old rows in batches.
UPDATE orders SET region = 'us'
WHERE region IS NULL AND id IN (
  SELECT id FROM orders WHERE region IS NULL LIMIT 10000
);
-- Run repeatedly until no rows updated.

-- Step 3 (deploy B, after app writes region for every new row).
-- Add the NOT NULL constraint as NOT VALID first, then VALIDATE.
ALTER TABLE orders ADD CONSTRAINT orders_region_not_null
  CHECK (region IS NOT NULL) NOT VALID;
-- VALIDATE only takes a SHARE UPDATE EXCLUSIVE lock, doesn't block reads/writes.
ALTER TABLE orders VALIDATE CONSTRAINT orders_region_not_null;

-- Step 4 (deploy C). Convert the check to a proper NOT NULL.
ALTER TABLE orders ALTER COLUMN region SET NOT NULL;
-- The previous CHECK lets Postgres skip a full scan; this is instant.
ALTER TABLE orders DROP CONSTRAINT orders_region_not_null;

Four steps to add NOT NULL safely. Worth it.

Dropping a column

ALTER TABLE ... DROP COLUMN takes an ACCESS EXCLUSIVE lock but the work itself is instant, Postgres just marks the column dead, doesn't reclaim the space. The space comes back via the next VACUUM.

The hard part is the application coordination:

  1. Deploy A: Stop writing to the column. Make the ORM's schema treat it as absent. Don't drop yet.
  2. Wait for the deploy to fully roll out and for any background workers that might still write to the column to roll over.
  3. Deploy B: The migration drops the column.

Skipping step 1 is what causes "column doesn't exist" errors from old clients during a rolling deploy.

Renaming a column

ALTER TABLE ... RENAME COLUMN is instant. The problem is that during a rolling deploy, the old version of your app refers to the old name, and the new version refers to the new name. Both can be live at the same time.

The pattern: add the new column, dual-write, swap reads, drop the old column.

rename-column.sqlsql
-- Deploy A: add the new column.
ALTER TABLE accounts ADD COLUMN display_name text;

-- Deploy B: dual-write. App writes both `name` and `display_name` on every
-- INSERT/UPDATE. Reads still use `name`.
UPDATE accounts SET display_name = name WHERE display_name IS NULL;
-- (Backfill in batches as above.)

-- Deploy C: app reads display_name, still writes both.

-- Deploy D: app stops writing `name`.

-- Deploy E: drop the old column.
ALTER TABLE accounts DROP COLUMN name;

Five deploys to rename a column safely. Most teams do three (deploy the dual-write, deploy the read-switch, deploy the drop). That works when your deploy cycle is fast and you can verify each step.

Changing a column's type

ALTER COLUMN ... TYPE rewrites the table for almost any non-trivial conversion. For a hot table that's an outage waiting to happen.

The general pattern: shadow column

type-change.sqlsql
-- Step 1: add a new column with the target type.
ALTER TABLE invoices ADD COLUMN total_cents bigint;

-- Step 2: backfill in batches.
UPDATE invoices SET total_cents = (total * 100)::bigint
WHERE total_cents IS NULL AND id IN (
  SELECT id FROM invoices WHERE total_cents IS NULL LIMIT 5000
);

-- Step 3: deploy a version of the app that dual-writes (writes both `total`
-- and `total_cents` on every change). Reads still use `total`.

-- Step 4: deploy a version that reads `total_cents` and still dual-writes.

-- Step 5: deploy a version that only writes `total_cents`.

-- Step 6: drop the old column.
ALTER TABLE invoices DROP COLUMN total;

Six steps. Annoying but boring; boring is what you want.

The shortcut Postgres 16+ gave us

For some narrow type changes, ALTER COLUMN ... TYPE ... USING ... can complete without a table rewrite. varchar(50) to varchar(100) is metadata-only. text to varchar(N) requires a scan if N is shorter than existing values. Always test on a copy of production first.

Splitting a table

A common shape: users grew to carry every user-related field, and you want to extract profiles into its own table. This is half a migration and half a refactor.

  1. Create the new table. Same primary key, same columns you want to split out.
  2. Dual-write. Triggers on the old table that mirror inserts and updates to the new table.
  3. Backfill from the old table to the new, in batches.
  4. Swap reads in the application code, one query at a time.
  5. Stop the dual-write by dropping the trigger.
  6. Drop the columns from the old table.

Tooling and CI

The bits of tooling that actually help:

  • A migration runner that knows about transactions. Drizzle, Prisma, Sqitch, Flyway. Pick one. Don't hand-run SQL.
  • A linter for unsafe migrations. Squawk catches the obvious mistakes (adding a column with a volatile default, locking patterns) in CI. Cheap and worth installing on day one.
  • Statement timeouts on the production database. Set statement_timeout to something reasonable (60 seconds) on your migration role. Better to fail loudly than block forever.
  • A way to look at the database during the migration. A SQL playground that's already authed and read-only by default beats fumbling with psql when something stalls. The SQL playground in Suparbase is built for exactly this, read-only by default with a clear write-mode toggle and statement timeout.
  • Lock observability. A query in your toolbox to show current locks (pg_locks joined to pg_stat_activity) so you can see what's blocked when something goes wrong.

None of this is exotic. It's the discipline of always assuming production traffic is live and your migration has to coexist with the previous version of the application. Once you internalise that, the rest is mechanical.

Suparbase is an admin workspace for Supabase. Encrypted credentials, server-side proxy, RLS debugger, SQL playground, AI assistant with diff-confirmed writes. Free tier for solo projects.

Related articles