Asking natural language questions to an AI to read your SQL database

Your database knows everything about your business. Your users know nothing about SQL. That’s the gap we’re closing today.

We’ll build a natural language interface for any SQL database — ask it a question in plain English, get real data back. Questions like:

We’re using Python, LangChain, Anthropic Claude, and PostgreSQL but the techniques apply to any other technology. Feel free to plug in your own. The full pipeline:

flowchart TD
    A[User Question] --> B[LLM Generates SQL]
    B --> C[Validate SQL]
    C --> D[Execute Query]
    D --> E[LLM Summarizes Results]
    E --> F[Answer]

Not all that complex. Extremely capable. Let’s build it.

Step 1 — Create the project

uv init ask-sql-natural-language-questions
cd ask-sql-natural-language-questions

Step 2 — Install dependencies

uv add langchain langchain-anthropic langchain-community sqlalchemy psycopg2-binary python-dotenv

What each package does:

PackagePurpose
langchainCore LangChain
langchain-anthropicClaude integration into LangChain
langchain-communitySQL database utilities
sqlalchemyDB connection toolkit. Makes it easier to connect Python code to any DB
psycopg2-binaryPostgreSQL driver
python-dotenvSo you can put secrets in a .env file

So Python -> SQLAlchemy -> psycopg2 -> PostgreSQL

Step 3 — Configure environment variables

Create .env in the project root:

ANTHROPIC_API_KEY=your_api_key_here
DATABASE_URL=postgresql+psycopg2://your_user:your_password@localhost:5432/your_db_name

On macOS, your database username is your macOS username by default — not postgres.

Step 4 — Connect to PostgreSQL

Create database.py:

from dotenv import load_dotenv
from sqlalchemy import create_engine
from langchain_community.utilities import SQLDatabase
import os

load_dotenv()

engine = create_engine(os.getenv("DATABASE_URL"))
db = SQLDatabase(engine)

LangChain’s SQLDatabase inspects your schema — tables, columns, relationships, data types — and packages all of it up for the LLM. Claude will know exactly what it’s working with.

Step 5 — Create the Claude client

Create llm.py:

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
    model="claude-sonnet-4-6",
    temperature=0
)

temperature=0 is deliberate. We want boring, predictable SQL. This is one of the rare times you actually don’t want creativity.

Step 6 — Generate SQL from natural language

Create sql_generation.py:

from database import db
from llm import llm

def generate_sql(question: str) -> str:
    schema = db.get_table_info()

    prompt = f"""You are a PostgreSQL expert.

Given this database schema:

{schema}

Convert the user's question into SQL.

Rules:
- Return ONLY SQL
- Use PostgreSQL syntax
- Never modify data
- Always include LIMIT 100 unless aggregation requires otherwise

Question: {question}
"""

    response = llm.invoke(prompt)
    return response.content.strip()

The schema goes into the prompt so Claude knows exactly what’s available. No guessing or hallucinating table names. Obviously the tables and columns must be named logically or we’ll need to provide the LLM with explanation of the goofy names.

Step 7 — Protect your database

This step is optional. I’m including it to raise awareness that …

Without protection, the AI could destroy your database.

I recommend at minimum:

  1. Use an account with read only privilege. This is the Principle of Least Privilege in action.
  2. Implement guardrails to limit the destructive SQL that is run. Like below.
  3. More levels of protection are better than fewer — that’s Defense in Depth.

Create validation.py:

FORBIDDEN = ["INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "TRUNCATE"]

def validate_sql(sql: str):
    upper_sql = sql.upper()
    for word in FORBIDDEN:
        if word in upper_sql:
            raise ValueError(f"Forbidden SQL operation attempted: {word}")

Step 8 — Execute the query

Create query_execution.py:

from database import db

def run_query(sql: str):
    return db.run(sql)

Step 9 — Summarize the results

Raw SQL output isn’t user-friendly. Have the LLM explain it and draw insights from it.

Create summarization.py:

from llm import llm

def summarize_results(question: str, sql: str, results: str) -> str:
    prompt = f"""User question:
{question}

SQL used:
{sql}

Results:
{results}

Explain the results clearly and concisely.
"""
    response = llm.invoke(prompt)
    return response.content

Step 10 — Wire everything together

Create main.py:

from sql_generation import generate_sql
from validation import validate_sql
from query_execution import run_query
from summarization import summarize_results

def ask_database(question: str):
    sql = generate_sql(question)
    validate_sql(sql)
    results = run_query(sql)
    answer = summarize_results(question, sql, results)

    return {
        "sql": sql,
        "results": results,
        "answer": answer
    }

question = input("Ask a natural language question: ")
response = ask_database(question)

print("\nGenerated SQL:")
print(response["sql"])

print("\nAnswer:")
print(response["answer"])

Step 11 — Run it

uv run python main.py

Ask it something:

What are our top 5 selling products?

You get back the generated SQL:

SELECT p.product_name,
       SUM(od.quantity) AS total_sold
FROM order_details od
JOIN products p ON od.product_id = p.product_id
GROUP BY p.product_name
ORDER BY total_sold DESC
LIMIT 5;

And then a plain-English summary of the results. No SQL knowledge required on the user’s end.

Before You Ship This

I gave you a very simple solution. But here are a few things to tighten up before this goes anywhere near production in the real world:

Treat AI-generated SQL the same way you’d treat SQL from an untrusted user. Because that’s basically what it is.


Need help implementing this pattern or training your team to build AI-powered data tools? Reach out — consulting and training available.