Guides

How to Get Started with SQLMesh

Arkzero ResearchApr 23, 20268 min read

Last updated Apr 23, 2026

SQLMesh is an open-source data transformation framework built as a faster, more reliable alternative to dbt. It introduces a plan-based workflow that shows exactly what SQL models will run and what data will change before any transformation executes. To get started, install it with pip, run sqlmesh init duckdb in a new directory, then run sqlmesh plan to preview your first environment. The whole setup takes under ten minutes on any machine with Python.
How to Get Started with SQLMesh

What SQLMesh Is and Why It Exists

SQLMesh is a data transformation framework: you write SQL models, and it handles running them against your data warehouse in the right order, tracking dependencies, managing environments, and telling you what will change before anything executes.

If you have worked with dbt before, the mental model is similar. You define models as SELECT statements, SQLMesh figures out the dependency graph, and your data flows through transformations cleanly. Where SQLMesh diverges is in how it handles change. In dbt, running a plan re-executes affected models, sometimes more than you intended, and always at cost. SQLMesh introduces virtual environments and a plan command that shows you exactly what will change and what will not before a single query runs.

A Databricks-led benchmark found SQLMesh to be roughly 9 times faster than dbt Core on representative transformations, with environment rollbacks running 136 times faster. Those numbers come from how SQLMesh handles unchanged models: it references them rather than re-computing them, so a one-model change does not trigger a full pipeline refresh.

For small teams and solo analysts, this matters practically. If your dbt pipeline runs 80 models and you change one, dbt may recompute dozens of downstream models regardless of whether their inputs changed. SQLMesh tracks state and skips anything that has not changed. On Snowflake or BigQuery, where compute is billed per query, that difference adds up fast.

Prerequisites

You need Python 3.8 or higher and pip. SQLMesh bundles DuckDB as a default local engine, so you do not need a database connection to try it. If you later want to connect to Snowflake, BigQuery, Redshift, or Databricks, you can add the relevant connector, but for this guide, DuckDB is enough.

Check your Python version:

python3 --version

If your system has an older Python, consider using pyenv to manage multiple versions without affecting system packages.

Step 1: Create a Virtual Environment

Isolating SQLMesh in a virtual environment keeps your system Python clean:

mkdir sqlmesh-project
cd sqlmesh-project
python3 -m venv .venv
source .venv/bin/activate

On Windows, the activation command is .venv\Scripts\activate instead.

Step 2: Install SQLMesh

Install the core package:

pip install sqlmesh

If you want the browser UI, which provides a visual DAG explorer and model editor, add the optional extra:

pip install 'sqlmesh[web]'

The web extra pulls in a lightweight FastAPI server and a React frontend. It adds about 40 additional packages but is not required to use SQLMesh from the command line.

Step 3: Initialize a Project

SQLMesh includes a scaffold generator. Running init with the duckdb engine creates a working example project:

sqlmesh init duckdb

This creates the following structure:

sqlmesh-project/
  config.yaml          # connection settings
  models/              # your SQL transformation models
  seeds/               # static CSV data loaded as tables
  audits/              # data quality checks
  tests/               # unit tests for models
  macros/              # reusable Jinja macros

The generated example includes two seed tables and two models that join and aggregate them. It is a working pipeline out of the box, with no edits needed to run your first plan.

Step 4: Run Your First Plan

This is where SQLMesh differs from dbt most visibly:

sqlmesh plan

SQLMesh evaluates the current state of your project and the target environment, then shows you a summary of what it wants to do. It lists which models will be backfilled, which are unchanged, and what the estimated impact is before asking for confirmation. Nothing runs until you type y.

On first run against a fresh DuckDB instance, the output looks like:

New environment `prod` will be created from `empty`

Models:
Added:
  db.full_model [FULL]
  db.incremental_model [INCREMENTAL_BY_TIME_RANGE]

Backfill:
  db.full_model: 2020-01-01 - 2024-01-01
  db.incremental_model: 2020-01-01 - 2024-01-01

Apply - Backfill Tables [y/n]:

Type y and SQLMesh executes. On a laptop, this takes seconds.

Step 5: Create a Development Environment

One of SQLMesh's most practical features is its virtual environment model. Creating a dev environment does not copy your data. It creates a logical layer that references the existing prod tables:

sqlmesh plan dev

You now have a dev environment. Any model changes you make in dev affect only that environment until you explicitly promote them to prod with another plan. This makes it safe to experiment without disrupting production data or paying to re-compute tables that have not changed.

This is the feature that separates SQLMesh most clearly from dbt in day-to-day work. In dbt, creating a dev environment typically means either using a separate schema with full data duplication or running against a sample dataset. SQLMesh's virtual environments cost nothing to create and nothing to maintain if your underlying data has not changed.

Step 6: Open the Browser UI (Optional)

If you installed the web extra, start the UI:

sqlmesh ui

Open http://localhost:8000 in your browser. The UI shows your model DAG, lets you run plans visually, browse model definitions, and run ad-hoc queries against your environments. It is particularly useful when onboarding new team members who are not yet comfortable with the CLI.

Understanding Model Kinds

Model kinds control how SQLMesh materializes each model. The kind is declared in the MODEL block at the top of each SQL file:

MODEL (
  name db.my_model,
  kind FULL
);

SELECT
  id,
  name,
  created_at
FROM
  raw.users

The four most commonly used kinds are:

  • FULL drops and recreates the table on each run. Use this for small reference tables or lookups.
  • INCREMENTAL_BY_TIME_RANGE appends only new data in a specified date window. SQLMesh injects @start_date and @end_date macros automatically, so you do not need to write Jinja conditionals for time-range logic.
  • VIEW creates a database view rather than a materialized table. Useful for lightweight transformations where you want to avoid storing duplicate data.
  • SCD_TYPE_2 handles slowly changing dimensions natively, tracking when a row was valid and when it was superseded.

Picking the right kind matters both for correctness and for compute cost. An INCREMENTAL_BY_TIME_RANGE model on a 500-million-row events table costs a fraction of what a FULL refresh would cost on each run.

Connecting to a Real Warehouse

To switch from DuckDB to Snowflake, update config.yaml:

gateways:
  default:
    connection:
      type: snowflake
      account: your-account
      user: your-user
      password: your-password
      database: analytics
      warehouse: compute_wh

BigQuery, Redshift, Databricks, and PostgreSQL follow the same pattern with their own connection parameters. The connection type strings and required fields for each warehouse are documented at sqlmesh.readthedocs.io under the Connections guide.

Migrating from dbt

SQLMesh includes a dbt compatibility mode. If you run sqlmesh init --dbt inside an existing dbt project directory, it converts your dbt models to SQLMesh format, preserving your SQL logic and model structure. You can also run SQLMesh in compatibility mode without a full conversion, which lets you use SQLMesh features alongside your existing dbt setup while you evaluate the migration.

The migration path is practical for teams that want to start using virtual environments and the plan workflow without rewriting all of their SQL. Most dbt models require only the addition of a MODEL block at the top of the file and minor adjustments to macros that SQLMesh handles differently.

What to Build Next

Once you have the scaffold running, a practical next step is migrating one existing SQL query into a SQLMesh model. Pick something you run manually on a schedule, such as a weekly revenue rollup or a daily user activity summary, and add a MODEL block to the top. Run sqlmesh plan to preview it, then schedule sqlmesh run via cron or any orchestration tool.

SQLMesh integrates natively with Airflow, Dagster, Prefect, and GitHub Actions, so dropping it into an existing orchestration setup is straightforward. If you are running VSLZ for your data analysis and using SQLMesh to manage the transformations that feed your datasets, the two complement each other well: SQLMesh handles the pipeline layer and VSLZ handles the exploration and reporting layer on top of it.

The open-source community around SQLMesh has grown substantially since Tobiko released version 0.100 in late 2024. The GitHub repository has over 7,000 stars, and the Slack community is active for troubleshooting questions.

FAQ

What is SQLMesh and how is it different from dbt?

SQLMesh is an open-source data transformation framework that compiles SQL models into a dependency graph and runs them against a data warehouse. It differs from dbt in two key ways: its plan command shows exactly what will change before anything runs, and its virtual environment system lets you create dev environments without copying or recomputing unchanged tables. A Databricks benchmark found SQLMesh approximately 9 times faster on representative workloads compared to dbt Core.

Is SQLMesh free to use?

Yes. SQLMesh core is fully open-source and free under the Apache 2.0 license. Tobiko, the company behind SQLMesh, offers a cloud product called Tobiko Cloud that adds managed state storage and collaboration features, but the open-source version is feature-complete and suitable for production use. You can self-host it on any infrastructure.

Can I use SQLMesh with my existing dbt project?

Yes. SQLMesh includes a dbt compatibility mode. If you run sqlmesh init --dbt inside an existing dbt project directory, it converts your dbt models to SQLMesh format, preserving your SQL logic and model structure. You can also run SQLMesh in dbt compatibility mode without full conversion, which lets you use SQLMesh features like virtual environments alongside your existing dbt setup.

What databases does SQLMesh support?

SQLMesh supports Snowflake, BigQuery, Databricks, Redshift, PostgreSQL, MySQL, DuckDB, Spark, Trino, and several others. DuckDB is the default local engine included with the base installation, so you can run SQLMesh without any external database connection for development and testing. Production deployments typically connect to a cloud data warehouse.

How does the SQLMesh plan command work?

When you run sqlmesh plan, SQLMesh compares the current state of your models to the target environment and generates a detailed summary of what will change: which models are new, which have been modified, which are unchanged, and which downstream models need recomputation because of upstream changes. It then asks for confirmation before executing anything. This differs from dbt, where running a command immediately triggers execution without a preview step.

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Airbyte Cloud data integration platform
Guides

How to Set Up Airbyte Cloud for Data Syncing

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.

Arkzero Research · Apr 29, 2026