How to Set Up Marimo for Data Analysis
Last updated Apr 5, 2026

Marimo is an open-source Python notebook that replaces Jupyter's cell-by-cell execution model with a reactive one. When you change a variable, every cell that depends on it reruns automatically. This eliminates the most common notebook bugs from cells run out of order, keeping code and outputs permanently consistent. Installing Marimo takes one command, and you can have an interactive data analysis environment running in under five minutes.
Why Analysts Switch from Jupyter to Marimo
Jupyter Notebook has been the standard for Python data analysis since 2014. Its core limitation is hidden state: cells can be run in any order, outputs can reflect code that no longer exists, and a notebook that runs cleanly top-to-bottom may fail for a colleague who runs cells in a different sequence.
Marimo solves this with a dependency graph. Each cell declares variables, and Marimo tracks which cells read those variables. Run a cell and all downstream cells rerun automatically. Delete a cell and its variables disappear from program state. The notebook is always consistent.
A March 2026 article in the Data Science Collective described Marimo as "reactive Python for the AI builder" and noted that teams at BlackRock, Shopify, and Pfizer have adopted it for reproducible research workflows. Marimo notebooks are stored as plain .py files rather than JSON, which means they work with git diff, code review, and standard testing pipelines without modification.
According to the Marimo GitHub repository, the project has been adopted by researchers at Stanford, Johns Hopkins, UC Berkeley, and Princeton. Unlike Jupyter's JSON-based .ipynb format, a Marimo notebook is a valid Python script that can be executed with python your_notebook.py directly from the command line.
Installation
Marimo requires Python 3.10 or later. Install the recommended package to get SQL support, interactive widgets, and the AI editor features:
pip install "marimo[recommended]"
To use with uv, the faster package manager that has become common in Python workflows:
uv add marimo
Confirm the installation succeeded:
marimo --version
Run the built-in interactive tutorial to see the reactive model in action before building your own notebooks:
marimo tutorial intro
Starting Your First Data Analysis Notebook
Create a new notebook with the edit command:
marimo edit analysis.py
This opens a browser-based editor at localhost:2718. Create a cell and assign a variable:
import pandas as pd
df = pd.read_csv("sales.csv")
df.shape
Add a second cell that references df:
df.describe()
Now edit the first cell to filter the dataframe to a single region:
df = pd.read_csv("sales.csv")
df = df[df["region"] == "North"]
df.shape
Both cells rerun immediately. The describe output updates to reflect only North region records. This is the reactive model working as designed: Marimo detected that df changed and automatically ran every cell that referenced it.
Adding Interactive Controls
Marimo includes a UI library for sliders, dropdowns, date pickers, and table filters. These controls are first-class reactive elements, not callbacks attached to event listeners.
import marimo as mo
region_picker = mo.ui.dropdown(
options=["North", "South", "East", "West"],
value="North",
label="Select Region"
)
region_picker
Reference the control in the next cell:
filtered = df[df["region"] == region_picker.value]
filtered.head(10)
When the user selects a different region in the dropdown, the filtered cell reruns immediately. No boilerplate, no callbacks. The notebook behaves like a reactive web application built in pure Python.
This pattern scales well. Combine a date range slider, a metric selector dropdown, and a chart library like Plotly or Altair, and you have a fully interactive analysis panel that any colleague can use from a browser without touching the code.
Querying Data with SQL
Marimo has first-class SQL support via DuckDB integration. Add a SQL cell and query a Python dataframe directly using standard SQL syntax:
SELECT region, SUM(revenue) AS total_revenue, COUNT(*) AS orders
FROM df
GROUP BY region
ORDER BY total_revenue DESC
DuckDB treats the Python variable df as a queryable table. You can join multiple dataframes, apply window functions, and run aggregations without loading data into a separate database process. The result returns as a new dataframe that subsequent cells can reference.
This is particularly useful for analysts who prefer SQL for aggregation but rely on Python for visualization and statistical operations. A practical workflow pattern:
- Load raw data in Python using pandas or polars
- Aggregate and filter using SQL cells backed by DuckDB
- Visualize and model in Python using matplotlib, seaborn, or scikit-learn
DuckDB can also query CSV and Parquet files on disk directly without loading them into memory first. For files up to several gigabytes, this is faster than pandas read_csv and requires no database setup or configuration.
import duckdb
result = duckdb.sql("SELECT * FROM 'large_file.parquet' WHERE year = 2025 LIMIT 1000").df()
Converting Existing Jupyter Notebooks
If you have existing .ipynb notebooks, Marimo provides a built-in conversion command:
marimo convert your_notebook.ipynb > your_notebook.py
The conversion handles most standard cells automatically. Cells with side effects (printing to stdout without returning a value, or modifying global state) may need minor adjustments to work with the reactive model. The general fix is to assign outputs to variables so Marimo can track the dependency chain.
The Marimo documentation recommends converting one notebook at a time and running it to verify behavior before migrating an entire project.
Running as a Web Application
A Marimo notebook deploys as an interactive web application with a single command, without a separate web framework or infrastructure:
marimo run analysis.py
This serves the notebook as a read-only app. Viewers see the controls and outputs but cannot edit the underlying code. The interface looks like a clean dashboard: no code cells, no console output, just the UI elements and visualizations the analyst configured.
For teams that want cloud hosting without managing servers, the Marimo team operates Molab at molab.marimo.io, a hosted notebook workspace with a free tier. Self-hosting via Docker is also straightforward using the official Marimo image.
Limitations to Know Before Migrating
Marimo's reactive model has one structural constraint worth understanding before committing to a migration. A cell can define a variable only once across the entire notebook. Two separate cells cannot both assign a value to df. This enforces clean dependency tracking but blocks certain exploratory patterns where analysts intentionally reassign the same variable in multiple steps.
For large-scale data engineering pipelines, orchestrators like dbt or Spark remain more appropriate choices. Marimo is designed for interactive analysis and sharing, not batch ETL jobs that run on a schedule without user interaction.
For non-technical teams that need to upload a file, ask a question, and get a chart back with no configuration at all, tools like VSLZ handle the end-to-end analysis from a file upload without any notebook setup.
Summary
Marimo replaces Jupyter's execution model with a reactive dependency graph that keeps notebooks consistent and reproducible. Installation is a single pip command. SQL cells run DuckDB queries against Python dataframes with no separate database required. Notebooks deploy as interactive web apps with marimo run. For analysts dealing with stale notebook outputs or wanting to share live analyses as interactive tools, Marimo is a practical upgrade from Jupyter with a low migration cost.
FAQ
Is Marimo free to use?
Yes. Marimo is fully open-source under the Apache 2.0 license. Install it locally at no cost with pip install marimo. The Marimo team also offers Molab, a hosted cloud workspace, which has a free tier for basic usage.
How is Marimo different from Jupyter Notebook?
Jupyter runs cells independently in any order, which allows hidden state to build up over time. Outputs can reflect code that no longer exists, and notebooks often fail when run top-to-bottom by a second person. Marimo uses a dependency graph so cells always run in the correct order automatically. Notebooks are also stored as plain Python files rather than JSON, making them compatible with git diff, code review, and standard testing tools.
Can Marimo notebooks run SQL queries?
Yes. Marimo integrates with DuckDB for SQL cells. You can query pandas or polars dataframes using standard SQL syntax directly in the notebook without setting up a separate database. Results return as dataframes that subsequent Python cells can use. DuckDB also supports querying CSV and Parquet files on disk without loading them into memory first.
How do I convert my existing Jupyter notebooks to Marimo?
Run: marimo convert your_notebook.ipynb > your_notebook.py. The command handles most standard cells automatically. Cells with side effects, such as those that print output without returning a value or modify global state, may need minor adjustments to fit the reactive model. The general fix is to assign outputs to named variables so Marimo can track dependencies.
How do I share a Marimo notebook with non-technical teammates?
Run marimo run your_notebook.py to serve the notebook as an interactive web app. Viewers see the controls and outputs but cannot edit the code. For cloud sharing without managing servers, Molab at molab.marimo.io provides hosted notebooks with a free tier. Self-hosting via Docker is also supported using the official Marimo image.


