Guides

How to Set Up Cube for Agentic Analytics

Arkzero ResearchApr 25, 20267 min read

Last updated Apr 25, 2026

Cube is an open-source semantic layer that sits between your database and your BI tools, defining metrics once so every query and AI agent uses the same numbers. You can run Cube locally with Docker in under ten minutes, connect it to a Postgres or warehouse database, generate a data model automatically, and query it in plain English using Cube's D3 agentic analytics layer. This guide walks through the complete setup from scratch.
How to Set Up Cube for Agentic Analytics

What is Cube and Why It Matters Now

Cube is an open-source semantic layer for data analytics. It connects to your database, lets you define measures and dimensions in YAML or JavaScript, and exposes those definitions via REST, GraphQL, and SQL APIs to any BI tool, application, or AI agent that needs them.

The core idea is consistency. Without a semantic layer, every analyst and every AI prompt re-derives the same metric from raw SQL, often getting slightly different answers depending on how the joins are written. Cube forces one canonical definition per metric. When you change how "monthly revenue" is calculated, every dashboard and every agent query inherits the update automatically.

In February 2026, Gartner named Cube a Representative Vendor in its Market Guide for Agentic Analytics. The guide's key finding: "Semantic and policy alignment is foundational for effective agentic analytics" and predicted that 60% of agentic analytics projects relying solely on Model Context Protocol (MCP) without a semantic layer would fail by 2028. That finding has driven a surge of interest from ops teams and analysts who want AI agents querying their data reliably, not hallucinating metrics.

This guide covers setting up Cube Core (the open-source version) locally with Docker, connecting it to a Postgres database, generating a first data model, and running your first natural-language query through the Cube D3 agentic interface.

Prerequisites

Before starting, you need:

  • Docker Desktop installed and running (version 24 or later)
  • A Postgres database with at least one table of real data — a local Postgres instance works fine
  • Node.js 18 or later (used to run the Cube CLI for model generation)
  • About 15 minutes

If you do not have Postgres running locally, you can spin one up in Docker: docker run --name pg-demo -e POSTGRES_PASSWORD=demo -e POSTGRES_DB=analytics -p 5432:5432 -d postgres:16. Load a sample dataset like the classic orders table with a quick seed script before continuing.

Step 1: Run Cube with Docker

Create a new empty folder for your Cube project:

mkdir cube-demo && cd cube-demo

Then start Cube with a single Docker command:

docker run -p 4000:4000 \
  -p 15432:15432 \
  -v ${PWD}:/cube/conf \
  -e CUBEJS_DEV_MODE=true \
  cubejs/cube

Port 4000 is the Cube API and the Developer Playground UI. Port 15432 is Cube's SQL API, which lets tools like Metabase or any Postgres-compatible client query Cube as if it were a database.

Open http://localhost:4000 in your browser. The Developer Playground loads.

Step 2: Connect Your Database

The Playground prompts you to select a data source. Choose PostgreSQL. Enter your connection details:

  • Host: host.docker.internal (if Postgres is running on your Mac or Windows host machine) or the container name if it is in the same Docker network
  • Port: 5432
  • Database: your database name
  • Username / Password: your Postgres credentials

Click Apply and Cube writes a cube.js environment file to your project folder. From this point on, Cube is reading live from your Postgres instance.

For production deployments, these credentials go into environment variables (CUBEJS_DB_HOST, CUBEJS_DB_NAME, CUBEJS_DB_USER, CUBEJS_DB_PASS) in a .env file or your secrets manager. Never hardcode them.

Step 3: Generate Your Data Model

Cube can inspect your database schema and generate a starter data model automatically. In the Developer Playground, go to Schema and click Generate. Select the tables you want to include. Cube produces YAML files in the model/cubes/ folder of your project.

A generated model for an orders table might look like this:

cubes:
  - name: orders
    sql_table: public.orders

    measures:
      - name: count
        type: count
      - name: total_revenue
        sql: amount
        type: sum

    dimensions:
      - name: status
        sql: status
        type: string
      - name: created_at
        sql: created_at
        type: time

This is a starting point, not a final definition. The important step is renaming measures to match how your business actually uses the terms and adding any calculated metrics your team cares about. A measure called total_revenue with clear SQL behind it is the kind of canonical definition that prevents the metric drift problem described earlier.

Edit the YAML directly in your project folder. Cube hot-reloads model changes without a restart.

Step 4: Test Queries in the Playground

Back in the Playground, click Build to run your first query. You can select measures and dimensions from a dropdown and Cube generates the underlying SQL, executes it, and shows results in table or chart form.

The SQL API (port 15432) lets you connect Metabase, Tableau, or any Postgres-compatible client directly. In Metabase, add a new database connection of type PostgreSQL, point it at localhost:15432, and use the same credentials as Cube's SQL API. From that point, Metabase sees all your Cube cubes as database tables.

You can also query via the REST API directly:

curl -G http://localhost:4000/cubejs-api/v1/load \
  --data-urlencode 'query={"measures":["orders.total_revenue"],"dimensions":["orders.status"]}' \
  -H 'Authorization: YOUR_API_TOKEN'

The API token is set with CUBEJS_API_SECRET in your environment.

Step 5: Enable Agentic Queries with Cube D3

Cube's D3 layer, announced in mid-2025 and now generally available, adds AI agents on top of the semantic model. The key difference from ad-hoc LLM-to-SQL approaches is that D3 agents query the semantic model, not the raw database. That means the agent cannot invent a revenue calculation — it must use the total_revenue measure you defined.

For Cube Cloud users, D3 activates from your account dashboard under Agentic Analytics. The free tier includes a limited number of agent requests per month, enough to test the feature with a real dataset.

Once enabled, the D3 interface lets you type questions like "What was total revenue by status last month?" and the agent resolves the query against your semantic model, returns results, and explains how it arrived at the answer. Because the semantic layer enforces consistent definitions, the answer from a D3 agent matches the answer from your Metabase dashboard using the same Cube connection.

For self-hosted Cube Core, D3 is not yet available as open source. The agentic features require Cube Cloud or an enterprise license.

What This Setup Gives You

After completing these steps, you have a working semantic layer that:

  • Serves consistent metric definitions to any BI tool via SQL, REST, or GraphQL
  • Generates SQL from your cubes automatically with caching built in
  • Provides a foundation for AI agents to query data without hallucinating metrics

The next practical step is defining more cubes for the tables your team queries most, adding role-based access control for row-level security, and connecting your primary BI tool to the SQL API.

If you want to skip the model-building phase and get to data questions immediately, VSLZ lets you upload a CSV or connect a data source and ask questions in plain English without writing YAML or configuring infrastructure — useful for ad-hoc analysis while your Cube semantic layer is still taking shape.

Practical Notes

Cube performs best as a long-running service, not a per-query container. For production, deploy it as a dedicated service with a Redis instance for caching (CUBEJS_CACHE_AND_QUEUE_DRIVER=redis). The default in-memory cache works for development but does not persist across restarts.

Keep your cube YAML files in version control. Model changes that redefine a measure should go through a review process the same way schema migrations do — a change to total_revenue SQL affects every dashboard and every agent query that uses it.

FAQ

What is Cube used for in data analytics?

Cube is a semantic layer that sits between your database and your analytics tools. It lets you define metrics like revenue, churn rate, or order count once in YAML or JavaScript, then exposes those definitions to BI tools, applications, and AI agents via REST, GraphQL, and SQL APIs. The main benefit is consistency: every tool and every query uses the same metric definition instead of each analyst re-writing the same SQL independently.

Can I run Cube locally without a cloud account?

Yes. Cube Core is fully open source and runs locally with a single Docker command: `docker run -p 4000:4000 -p 15432:15432 -v ${PWD}:/cube/conf -e CUBEJS_DEV_MODE=true cubejs/cube`. You do not need a Cube Cloud account for the core semantic layer, REST API, GraphQL API, or SQL API. Cube Cloud adds managed infrastructure, the D3 agentic analytics layer, and enterprise access controls.

How does Cube's semantic layer differ from just writing SQL views?

SQL views are static and live inside the database. Cube's semantic layer is dynamic, version-controlled, and accessible via multiple APIs outside the database. Cube also adds caching, role-based access control, multi-tenancy, and the ability to connect the same model to multiple BI tools simultaneously. AI agents querying Cube are constrained to the defined measures and dimensions, preventing them from generating ad-hoc SQL that could return inconsistent results.

What databases does Cube support?

Cube Core works with all major SQL data sources, including PostgreSQL, MySQL, Snowflake, BigQuery, Databricks, Redshift, ClickHouse, Amazon Athena, and Presto. The connection is configured via environment variables. You can also use multiple data sources in a single Cube deployment with data source routing configured in the cube.js file.

What is Cube D3 and who can use it?

Cube D3 is Cube's agentic analytics platform, announced in 2025 and recognized in the 2026 Gartner Market Guide for Agentic Analytics. It adds AI agents that query your semantic model using natural language and return explainable, governance-compliant results. D3 is available to Cube Cloud users, including a free tier with limited agent requests. Self-hosted Cube Core does not include D3 as of early 2026.

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Airbyte Cloud data integration platform
Guides

How to Set Up Airbyte Cloud for Data Syncing

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.

Arkzero Research · Apr 29, 2026