Guides

How to Get Started with Marimo for Data Analysis

Arkzero ResearchApr 29, 20266 min read

Last updated Apr 29, 2026

Marimo is a reactive Python notebook that automatically reruns dependent cells when inputs or data change, eliminating the manual execution-order problem that makes Jupyter notebooks unreliable. With a built-in SQL engine powered by DuckDB, interactive UI components, and a one-command install, marimo lets analysts build reproducible, interactive data workflows in a fraction of the time traditional notebooks require. This guide covers installation, SQL queries, interactive filters, chart rendering, and app deployment.
A data analyst working at a modern desk with multiple monitors showing code and data visualizations

Marimo is a reactive Python notebook where every cell declares its inputs and outputs, and changing any value automatically reruns all dependent cells. The result is a notebook that always reflects the current state of your data, with no manual "Run All" required and no hidden state from out-of-order execution.

As of April 2026, marimo has over 10,000 GitHub stars and ships with built-in DuckDB SQL, interactive UI widgets, app deployment mode, and marimo pair, an AI agent that runs inside a live notebook session.

Why Marimo Solves a Real Jupyter Problem

The most common Jupyter failure mode is a notebook that produces a different result depending on which cells you ran and in which order. Analysts share notebooks with colleagues who rerun them top-to-bottom and get different outputs than the original author. This is not a user error. It is a structural limitation: Jupyter treats cells as independent scripts sharing a global namespace, so execution order determines results.

Marimo treats a notebook as a directed acyclic graph. Each cell declares variables it defines and variables it reads. If cell B reads a variable defined in cell A, marimo knows B depends on A. Changing A automatically marks B as stale and reruns it. The notebook is always consistent.

Marimo notebooks are stored as plain Python files rather than JSON blobs. This means readable git diffs, no merge conflict noise from output metadata, and the ability to run the same file as a command-line script or deploy it as a web app without conversion.

Installation

Requirements: Python 3.9 or later.

pip install "marimo[sql]"

The [sql] extra installs DuckDB alongside marimo. If you use the uv package manager:

uv add "marimo[sql]"

Verify the install:

marimo --version

Launch the editor with:

marimo edit my_analysis.py

This opens a browser-based notebook editor. The file my_analysis.py is created if it does not exist and stores the full notebook as pure Python.

Loading Your First Dataset

In a new cell, load a CSV:

import marimo as mo
import pandas as pd

df = pd.read_csv("sales.csv")
df

Marimo renders the dataframe as an interactive, sortable, filterable table automatically. No extra display call needed.

For large files, DuckDB avoids the memory overhead of loading everything into a pandas dataframe. In a benchmark published by the DuckDB team, reading a 10 GB CSV with DuckDB took 4.2 seconds versus 47 seconds for pandas on equivalent hardware.

import duckdb

conn = duckdb.connect()
result = conn.execute("SELECT * FROM 'large_file.parquet' LIMIT 5000").df()
result

Writing SQL Cells

Marimo's SQL cell type lets you query dataframes, Parquet files, and databases without Python boilerplate.

Click the cell-type dropdown and select SQL, then write:

SELECT
  region,
  SUM(revenue) AS total_revenue,
  COUNT(*)     AS order_count
FROM df
GROUP BY region
ORDER BY total_revenue DESC

The result is automatically assigned to a Python variable (default name _df) that downstream cells can reference. SQL cells participate in the reactive graph the same way Python cells do: if df changes because an upstream filter was updated, the SQL cell reruns and the output updates automatically.

By default, marimo uses DuckDB in-memory as the SQL engine. To switch to PostgreSQL, SQLite, or MySQL, set the connection attribute on the SQL cell to a SQLAlchemy engine.

import sqlalchemy
engine = sqlalchemy.create_engine("postgresql://user:password@localhost/mydb")

Then select that engine in the SQL cell header. The query runs against the external database and the result still flows through the reactive graph as a Python dataframe.

Adding Interactive Filters

Marimo's UI components create sliders, dropdowns, and date pickers that connect directly to analysis cells.

date_range = mo.ui.date_range(
    start="2024-01-01",
    stop="2024-12-31",
    label="Filter by date"
)
date_range

Reference the widget value in the next cell:

filtered = df[
    (df["date"] >= date_range.value[0]) &
    (df["date"] <= date_range.value[1])
]
filtered

When the user adjusts the date slider, marimo reruns the filter cell and every cell downstream of it automatically. No callbacks, no state management, no event handlers.

For categorical filtering:

region_filter = mo.ui.dropdown(
    options=df["region"].unique().tolist(),
    label="Select region"
)
region_filter

Chaining multiple filters works the same way. Each widget exposes a .value property that downstream cells read as a normal Python variable.

Building Charts

Marimo renders Altair, Plotly, and Matplotlib natively.

import altair as alt

chart = alt.Chart(filtered).mark_bar().encode(
    x=alt.X("month:O", title="Month"),
    y=alt.Y("revenue:Q", title="Revenue ($)"),
    color="category:N"
).properties(width=600, height=300)

chart

Because filtered is reactive, the chart updates whenever the date range or region dropdown changes. The notebook behaves like a live dashboard rather than a static document, without requiring a separate dashboard framework.

Deploying as an App

Any marimo notebook can run as a standalone web app accessible to team members who do not write Python.

marimo run my_analysis.py

App mode shows only outputs, hiding all code cells. The interactive UI components remain fully functional. Team members can adjust filters and explore data through the interface without touching the underlying code.

For public sharing, marimo apps run on Hugging Face Spaces at no cost. For internal deployment, they run on any server with Python installed behind a standard reverse proxy.

Using marimo pair

Marimo pair, launched in early 2026, adds an AI agent directly inside a live notebook session. The agent has access to your current dataframe variables, query results, and installed packages, so you can instruct it in plain English.

To enable it, click the AI icon in the editor toolbar. Once active, you can prompt it with statements like "aggregate this by week and chart the trend" and the agent writes the code, executes the cell, and shows the result inline.

The key difference from a general coding assistant is context: marimo pair reads your actual loaded dataframes and recent outputs rather than working from a blank slate.

Sharing and Exporting

Since notebooks are plain .py files, sharing via git is straightforward. Diffs are readable, and commits capture the exact state of the analysis without binary noise.

To export a static snapshot:

marimo export html my_analysis.py -o report.html

The HTML file includes all rendered outputs and interactive Altair charts, viewable in any browser without Python.

Practical Summary

Marimo's reactive model solves the most common Jupyter reliability problem. The built-in SQL engine, interactive UI widgets, and one-command app deployment make it practical for analysts who want reproducibility without restructuring their existing Python workflow. Install time is under a minute, and the transition from Jupyter is shallow because core Python syntax is identical. Start with a single CSV, write a SQL aggregation, add a date filter, and you have a self-updating dashboard in a single file.

FAQ

Is marimo compatible with Jupyter notebooks?

Marimo is not directly compatible with .ipynb files because it uses a different architecture. However, marimo includes an import command (marimo convert notebook.ipynb -o notebook.py) that converts Jupyter notebooks to marimo format. Some manual cleanup is usually needed for cells that rely on Jupyter-specific magic commands or out-of-order execution patterns.

What databases can marimo connect to for SQL queries?

Marimo's built-in SQL engine uses DuckDB by default, which supports in-memory queries, local CSV, Parquet, and JSON files, and remote object storage like S3. For external databases, marimo SQL cells accept any SQLAlchemy-compatible connection, covering PostgreSQL, MySQL, SQLite, BigQuery, Snowflake, and others. The connection is set in the SQL cell header.

Can non-technical users interact with marimo notebooks?

Yes, through app deployment mode. Running marimo run notebook.py launches the notebook as a web app that hides all code cells and displays only outputs and UI widgets. Users can interact with dropdowns, sliders, and date pickers to filter and explore data without seeing or touching Python code.

How does marimo handle large datasets?

Marimo's built-in DuckDB SQL engine reads large files directly from disk without loading the entire dataset into memory. DuckDB is practical for files up to several hundred gigabytes on a standard laptop. For files above that range, marimo can connect to external databases like Snowflake, BigQuery, or a PostgreSQL instance via SQLAlchemy.

Does marimo work on Windows?

Yes. Marimo supports Windows, macOS, and Linux. Install with pip install marimo[sql] in any Python 3.9 or later environment. The notebook editor opens in any modern browser. The only platform-specific consideration is that some shell-based install flows work more smoothly on Unix systems, but the Python installation path is identical across operating systems.

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Airbyte Cloud data integration platform
Guides

How to Set Up Airbyte Cloud for Data Syncing

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.

Arkzero Research · Apr 29, 2026