Articleaipostgressupabasetooling

The AI-Assisted Database Admin in 2026: What Actually Works

Two years into LLM-assisted database operations, here's what we've learned shipping AI features in admin tools: tool-use beats text-to-SQL, schema-aware agents, confirm-then-execute writes, and the failure modes.

Published May 13, 2026· Updated May 14, 2026 14 min read

Almost every database tool now has a chat box. Some of them genuinely help; most are demos that fall apart the first time you point them at a real production schema. After two years of building, shipping, and operating an AI-assisted admin in production, here's what we've learned about which patterns actually work.

Why 2023's text-to-SQL flopped

The first wave of LLM-on-databases was simple: prompt the model with a schema dump, give it a user question, ask for SQL, execute. By 2024 every team that shipped this had the same complaint: it hallucinated column names. Tables that didn't exist. JOINs that referenced nonexistent foreign keys. Queries that looked right and weren't.

The four root causes:

Schemas don't fit in context. Real schemas have hundreds of tables. Stuffing them all into the prompt either truncated (model invents what it can't see) or cost a fortune in tokens per query.
The model couldn't verify before writing. "Generate SQL" gives the model one shot. If it's wrong, it can't notice; only the human running the query notices, after it's already produced bad output.
No feedback loop. When a query errored, the system handed the error back to the human, not to the model. So the same hallucinations recurred.
Writes were terrifying. If reads occasionally fabricate, writes do too. "Update all users" with no confirmation is not a feature.

The tool-use pattern that won

The model that has actually shipped in production by 2026 is a tool-using agent with narrow, schema-aware primitives. Instead of "here's the schema, write SQL", the pattern is:

The model is given a small set of tools (functions).
The model decides which tool to call, and with what arguments.
The server runs the tool, returns a result.
The model decides what to do next: call another tool, or answer the user.
The loop continues until the model produces a final reply.

The tools for an admin agent look like this:

tools.tsts

type Tools = {
  list_tables(category?: string): TableSummary[];
  get_table_schema(table_name: string): ColumnInfo[];
  query_rows(args: {
    table_name: string;
    columns?: string[];
    filters?: Filter[];
    sort?: { column: string; direction: "asc" | "desc" };
    limit?: number;  // hard cap at 50
  }): { rows: Row[]; estimatedTotal: number };
  count_rows(args: { table_name: string; filters?: Filter[] }): { count: number };
};

Notice what's not there. No run_arbitrary_sql. The model can't fabricate columns because the tools validate every argument against the actual schema before they execute. If the model passes tables.naame, the tool returns "column doesn't exist" and the model corrects itself on the next turn.

Schema grounding

The model needs to know what tables exist, but it doesn't need the full schema in the system prompt. Two-pass works much better:

The system prompt tells the model that list_tables is its starting point and gives the table count.
The model calls list_tables on its first turn, receives a compact catalogue (name + AI-inferred description + column count).
For the 2-3 tables the model decides are relevant, it calls get_table_schema to load the full column list.
Only then does it construct a query.

This pattern uses ~10x fewer tokens than dumping the schema, and scales to schemas of any size. We've tested it on production Supabase projects with 200+ tables; the model usually nails the right table on the first list_tables call.

The "AI-inferred description" matters. A table called tbl_usr_acc is opaque; a description "Users and their account metadata" tells the model what it does. Generating these descriptions once per schema and caching them is cheap and durable. (This is exactly what our schema analysis does.)

Confirm-then-execute for writes

Reads can be lossy and the worst that happens is a wrong answer. Writes can be catastrophic. The pattern that has actually shipped in production:

The agent has no direct write tools. There's no update_rows or delete_rows in its toolbox.
Instead, there are propose_* tools that build a proposal payload with: the planned change, the affected-row count, and a preview of the first N affected rows.
The UI renders the proposal as a card with a diff and an Apply button.
When the human clicks Apply, the server re-validates the proposal (re-counts affected rows, re-checks the diff would match) and executes it through the same audit-logged proxy as any other write.

propose-update.tsts

type ProposeUpdate = {
  table_name: string;
  filters: Filter[];     // identifying which rows to change
  patch: Record<string, unknown>;  // column -> new value
  summary: string;       // one-sentence English description for the user
};

type UpdateProposal = ProposeUpdate & {
  kind: "proposed_update";
  preview: Row[];       // up to 5 affected rows
  totalCount: number | null;
};

The crucial property: the agent cannot turn a proposal into an execution by itself. Only an explicit human click does that. This is the difference between a useful assistant and a foot-gun.

Audit everything

Every AI-assisted write needs to land in your audit log indistinguishably from a human-driven write, with one extra bit of metadata: that an agent originated the proposal. After an incident, you want to be able to answer "who or what made this change?" from a single query.

For Suparbase, every Apply click writes:

User id (the human who clicked).
Connection id (which Supabase project was touched).
Table, primary key, verb, HTTP status.
The full before / after snapshot.
The proposal text (so you can see what the agent suggested).

The row history panel surfaces these per-row; the audit log on the dashboard shows them globally. Skipping this step is how teams end up with an unrecoverable Friday afternoon.

Failure modes in 2026

Two years in, the patterns that still fail:

1. Cross-table queries past three joins

Models in mid-2026 can compose two-table joins reliably. Three-table joins with non-obvious foreign keys are still where they invent relationships. The mitigation is to expose the foreign-key graph explicitly in the schema tool, so the model can see what joins exist before constructing them.

2. Aggregates over windows that exceed the model's context

"Summarise the last 10,000 orders" doesn't fit in context. You need to either give the model a SUM/COUNT tool with a narrow surface, or accept that the assistant's output is over a sample.

3. Time-relative queries

"Users who signed up last week" depends on what "last week" means. Embed the current time in the system prompt and define what relative ranges resolve to. Otherwise the model picks an arbitrary anchor.

4. Ambiguous tables

When two tables could match (orders and archived_orders), the model often picks the first one in alphabetical order. The fix: better descriptions in the schema analysis, and a system-prompt instruction to ask the user when ambiguous.

What comes next

Three directions that look like they'll matter through 2026:

Schema-aware refactoring proposals. "This column has the wrong type" or "these two tables should be one", surfaced by the agent reviewing the schema, not by a query. Useful for migrations.
RLS policy generation. Tell the agent the authorization model ("users can only see their own orders, but admins see all") and have it emit a policy with the appropriate USING / WITH CHECK clauses. We've experimented; the agent gets 80% of the way there, and humans need to verify the last 20%.
Anomaly detection over recent writes. The audit log + an agent + simple summarisation tools = "tell me if anything weird happened today".

None of these will replace the engineer in the loop. The bar that has shipped is "the assistant makes the engineer faster at the things they were already going to do". That's the bar we've held our AI chat to, and it's the bar we think other tools in the space should aim at too.

Suparbase is an admin workspace for Supabase. Encrypted credentials, server-side proxy, RLS debugger, SQL playground, AI assistant with diff-confirmed writes. Free tier for solo projects.

Try Suparbase free See all features