Guides

How to Get Started with Wren AI

Arkzero ResearchApr 29, 20267 min read

Last updated Apr 29, 2026

Wren AI is an open-source GenBI platform that connects to your database and lets non-technical users query it in plain English. It generates accurate SQL, charts, and answers directly in the browser. A semantic layer called MDL lets you encode business definitions once so every AI-generated query stays consistent. Setup takes about 10 minutes using Docker Compose and an OpenAI API key. Wren AI supports PostgreSQL, BigQuery, Snowflake, MySQL, ClickHouse, DuckDB, and 12 other data sources.
Wren AI logo on clean background

Wren AI is an open-source GenBI platform that connects to your existing database and lets you ask questions in plain English. It generates SQL, executes it, and returns results as tables or charts in the browser. Setup requires Docker Compose and an OpenAI API key. A working local instance connected to PostgreSQL or BigQuery takes about 10 minutes. No SQL knowledge is needed to run queries once the system is configured.

Why Most Text-to-SQL Tools Break in Production

Text-to-SQL tools have existed since at least 2020, but most fail beyond demos because the AI guesses at what your business terms mean. "Revenue" in your schema might be gross, net, or recognized revenue depending on the team asking. "Active user" might mean last-30-days for the product team and last-7-days for the growth team. When the AI has to infer meaning from raw column names, it picks wrong.

Wren AI solves this with a semantic layer called MDL (Modeling Definition Language). You define your business terms, column descriptions, metric formulas, and table relationships once. Every query the AI generates draws from those definitions rather than guessing from schema alone. According to Wren AI's published evaluation benchmarks, teams using the semantic layer reduced incorrect SQL generation by roughly 60 percent compared to direct LLM-to-database approaches. That delta is why setup time matters: the semantic modeling step is what makes answers trustworthy at scale.

What You Need Before Starting

You need Docker and Docker Compose installed on your machine. You need an OpenAI API key (GPT-4o is recommended; Wren AI also supports Claude, Gemini, and local Ollama models for teams that want no data leaving the building). You need a running database: PostgreSQL, MySQL, BigQuery, Snowflake, ClickHouse, DuckDB, Microsoft SQL Server, Amazon Redshift, or any of the 12-plus supported sources.

You do not need to know SQL. That is the point.

Step 1: Clone and Configure

Pull the Wren AI repository and copy the example environment file:

git clone https://github.com/Canner/WrenAI.git
cd WrenAI
cp .env.example .env

Open .env in any text editor. Set three values at minimum:

OPENAI_API_KEY=your_key_here
LLM_PROVIDER=openai
GENERATION_MODEL=gpt-4o

If you want a fully local setup with no API costs, set LLM_PROVIDER=ollama and point it at your Ollama endpoint. The local route takes more configuration but keeps every query and schema detail on-premise, which matters for teams with sensitive data.

Step 2: Start the Services

docker compose up -d

This pulls five containers: the Next.js browser UI, the AI service, the Wren Engine (the query executor), a vector store for semantic retrieval, and a metadata database. On a standard machine with 8 GB of RAM, the full stack is ready in about two minutes.

Navigate to http://localhost:3000. You will see the Wren AI onboarding screen.

Step 3: Connect Your Data Source

Click "Add Data Source" and select your database type. For PostgreSQL, enter the host, port, database name, username, and password. For BigQuery, paste in a service account JSON. For DuckDB, upload a local .db or .duckdb file directly.

Wren AI reads your schema automatically once connected. It detects tables, columns, data types, and foreign key relationships. You do not configure this manually.

Step 4: Build the Semantic Model

This step is what separates Wren AI from a generic text-to-SQL wrapper. In the modeling screen, you map raw tables and columns to business concepts.

Select a table, then add a Calculated Field. For example, create a field called net_revenue with a formula that subtracts discounts from gross sales. Give it a plain-English description: "Net revenue after all discounts and returns, used for quarterly reporting." This description travels into the AI prompt every time the system answers a question involving that metric.

Add descriptions to individual columns where the name alone is ambiguous. "This column records invoice date, not the date payment was received" is the kind of context that prevents an entire class of wrong answers about timing.

Define table relationships explicitly. Specify that orders.customer_id joins to customers.id on a many-to-one basis. Without this, the AI infers joins from column name patterns alone, which breaks on any schema that does not follow standard foreign key naming conventions.

Teams that spend 15 to 20 minutes on modeling at setup report a substantial reduction in incorrect joins and aggregation errors during regular use. The modeling step is an investment, not overhead.

Step 5: Run Your First Query

With the model saved, click "Ask" and type a question in plain English:

  • "What were total net sales by region last month?"
  • "Show me customers who placed more than three orders in Q1."
  • "Which product category had the highest return rate in the past 90 days?"

Wren AI sends the question through its retrieval pipeline, generates SQL grounded in your semantic model, executes it on the Wren Engine, and returns a result table or chart. Click "Show SQL" at any point to see exactly what ran. You can edit the SQL inline, re-run it, and save corrected queries as example pairs that improve future responses.

Charts render directly in the browser. You can pin results to a shared dashboard with one click, share the dashboard with a link, or export as CSV.

Where Wren AI Works Well and Where It Struggles

Wren AI handles aggregations, filters, group-bys, and multi-table joins well when the semantic model is populated. Date range questions such as "last quarter" and "year over year" work reliably if date columns have descriptions attached. Bar charts, line charts, and pivot tables render natively.

Where quality degrades: very complex multi-step CTEs, recursive queries, and schemas with more than 50 to 60 tables tend to produce lower-quality answers because the retrieval context gets diluted. For large schemas, scope your semantic model to the tables relevant to your actual use case rather than importing the full schema. A focused model with 10 well-described tables consistently outperforms a sprawling model with 80 tables and no descriptions.

Who This Setup Is For

Wren AI makes sense for teams that already run a live transactional or analytical database and want non-technical colleagues to get answers from it without submitting data requests to an analyst. A sales ops manager who needs weekly pipeline reports, an account manager who needs customer order history, a finance analyst who queries monthly revenue without involving the data team. If your current bottleneck is that data lives in a database only engineers can query, Wren AI removes that barrier.

If your data primarily lives in uploaded CSV files or spreadsheet exports rather than a live database, VSLZ lets you run the same natural-language query flow from a direct file upload with no Docker setup required.

Next Steps After Your First Query

Once you have a working query, three things are worth doing before sharing the instance with your team. First, add at least five saved example queries to the Wren AI question catalog. These ground the semantic retrieval step and measurably improve answer relevance for follow-up questions. Second, review the user access controls in the settings panel before inviting colleagues. You can restrict which users can see which tables, which matters for schemas that mix sensitive and general-purpose data. Third, set up the Wren AI API if you want query results to flow automatically into downstream systems such as Slack reports or scheduled spreadsheet refreshes. The API supports webhook callbacks and scheduled queries.

Wren AI has over 12,000 GitHub stars and an active community of 1,500 in Discord. Most issues on the public tracker receive responses within 48 hours.

FAQ

Does Wren AI require a database, or can it work with CSV files?

Wren AI is designed to connect to live databases: PostgreSQL, MySQL, BigQuery, Snowflake, ClickHouse, DuckDB, Amazon Redshift, Microsoft SQL Server, and several others. It does not natively ingest raw CSV uploads as a standalone data source. For CSV-first workflows without a running database, DuckDB (which Wren AI supports as a data source) can be loaded with CSV data via a quick CLI command, giving you a DuckDB file you can connect to Wren AI.

What LLMs does Wren AI support?

Wren AI supports OpenAI models (GPT-4o is recommended for best accuracy), Anthropic Claude, Google Gemini, and any OpenAI API-compatible endpoint including local Ollama models. The LLM provider is set in the .env file at setup time. Teams with strict data residency requirements typically use Ollama with a locally hosted model so queries and schema details never leave the network.

How long does the semantic modeling step take?

For a focused schema with 10 to 20 tables, basic modeling (adding descriptions to key columns, defining relationships, and adding two or three calculated fields) typically takes 20 to 30 minutes. The more descriptions you add, the more accurate the AI-generated queries become. Wren AI's documentation recommends spending at least this time upfront before inviting team members to use the system, as answer quality is directly correlated with how well the semantic model describes the data.

Is Wren AI free to use?

Wren AI is open source under the Apache 2.0 license and free to self-host. You incur costs for the LLM API calls (typically OpenAI tokens per query), your own infrastructure for running Docker containers, and any database costs you already have. Wren AI does not charge per query or per seat in the self-hosted version. The company also offers a cloud version with managed hosting; pricing for that tier is available on request from the Wren AI team.

How accurate is the SQL Wren AI generates?

Accuracy depends heavily on how well the semantic model is configured. With a well-described schema (column descriptions, defined relationships, calculated fields for key metrics), Wren AI performs reliably on single-table aggregations, multi-table joins, and standard filter operations. Wren AI's own published benchmarks report roughly 60 percent fewer incorrect queries compared to direct LLM-to-database generation without a semantic layer. Complex CTEs, nested subqueries, and recursive patterns still benefit from human review of the generated SQL before accepting results.

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Airbyte Cloud data integration platform
Guides

How to Set Up Airbyte Cloud for Data Syncing

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.

Arkzero Research · Apr 29, 2026