Guides

How to Set Up Hex for AI Data Analysis

Arkzero ResearchApr 9, 20267 min read

Last updated Apr 9, 2026

Hex is a collaborative data notebook that combines SQL, Python, and no-code tools with built-in AI agents for exploratory analysis. Setting it up requires connecting a data warehouse, endorsing trusted tables, adding descriptions, and configuring workspace context so the AI agent can generate accurate queries. The free Community plan supports unlimited public projects, and paid plans start at $36 per editor per month.
How to Set Up Hex for AI Data Analysis

What Hex Does and Why It Matters

Hex is a data workspace that merges notebooks, SQL editors, and AI agents into one environment. Analysts write SQL or Python, build visualizations, and share interactive data apps from the same project. The Notebook Agent sits inside every project and generates code, runs queries, and builds charts from plain-English prompts.

Unlike standalone AI chatbots that require you to upload CSVs, Hex connects directly to your cloud data warehouse. That means your analysis runs against live, governed data rather than stale exports. According to Hex, teams using their platform have reduced time-to-insight by up to 70% compared to traditional notebook workflows.

This guide walks through the setup process from account creation to your first AI-assisted analysis.

Step 1: Create an Account and Choose a Plan

Go to hex.tech and sign up. The Community plan is free and gives you unlimited public projects, SQL and Python cells, and access to the Notebook Agent. You do not need a credit card.

If you need private projects and team collaboration, the Professional plan runs $36 per editor per month. The Team plan at $75 per editor per month adds role-based access, scheduled runs, and API access. Enterprise pricing is custom.

For this guide, the free Community plan is enough to follow every step.

Step 2: Connect Your Data Warehouse

Click the gear icon in the left sidebar and open "Data connections." Hex supports direct connections to Snowflake, BigQuery, Databricks, Amazon Redshift, PostgreSQL, Azure Synapse, and several others.

Enter your connection credentials: host, database name, schema, and authentication details. For Snowflake, that means your account identifier, warehouse name, and role. For BigQuery, you upload a service account JSON key.

Once connected, Hex pulls your schema metadata automatically. If your warehouse has hundreds of schemas, use the schema filter feature to restrict which ones appear in Hex. This keeps the workspace clean and helps the AI agent focus on relevant tables.

Test the connection by running a simple query like SELECT 1 from a new SQL cell.

Step 3: Endorse Trusted Data

Data endorsement tells the AI agent which tables are approved for analysis. Without endorsement, the agent treats all visible tables equally, which can lead to queries against staging tables or deprecated schemas.

Navigate to the Data browser in the left sidebar. Select a database, schema, or individual table and click the endorsement badge to mark it as "Approved" or "Trusted." Child objects inherit their parent's endorsement status, so endorsing a schema covers all tables within it.

This step takes five minutes but significantly improves agent accuracy. Endorsed data gets prioritized across all AI features in Hex.

Step 4: Exclude Non-Essential Data from AI

Beyond endorsement, Hex lets you hide specific tables from the agent entirely. In the Data browser, select any database, schema, or table and toggle "Include/Exclude for AI."

Excluded data remains accessible to human users who write manual queries. The agent simply cannot see or reference it. This is useful for internal audit tables, raw event logs, or any data that would confuse automated analysis.

Step 5: Add Table and Column Descriptions

Descriptions are the single most impactful configuration for agent quality. The Notebook Agent reads these descriptions when deciding which tables to join, how to aggregate measures, and what filters to apply.

For each important table, add a description that covers three things: what the table represents, what calculations it supports, and any important caveats. For example: "Monthly revenue by product line. Use revenue_usd for dollar amounts. Excludes refunds and credits. Joins to dim_product on product_id."

For columns with low cardinality (like status fields or region codes), list the possible values directly in the description: "Order status. Values: pending, shipped, delivered, returned, cancelled."

For text columns with specific patterns, include an example: "Customer ID format: CUST-XXXXX where X is alphanumeric."

You can manage descriptions in your warehouse (they sync automatically) or add them directly in Hex's Data browser.

Step 6: Build Semantic Models

Semantic models define how tables relate to each other and how metrics should be calculated. They act as guardrails that prevent the agent from writing incorrect joins or applying wrong aggregation logic.

In Hex, create a semantic model by navigating to the Models section. Define your entities (tables), relationships (joins with specified keys), and measures (calculations like SUM, COUNT DISTINCT, or weighted averages).

Once a semantic model exists, the agent uses its definitions instead of guessing from raw table structures. This is especially valuable for star schemas where the correct join path between fact and dimension tables is not obvious from column names alone.

Step 7: Configure Workspace Context

The workspace context file is a markdown document that provides business-specific information to every AI agent in your workspace. Only admins can edit it.

Open Settings, then AI Configuration, and write a context file that covers your company's terminology and abbreviations, preferred SQL conventions, business rules, and any known data quirks.

A practical example:

- "MRR" means Monthly Recurring Revenue, calculated as SUM(subscription_amount) for active subscriptions
- All revenue figures are in USD unless the column name ends with _local
- Fiscal year starts April 1
- Use COALESCE(column, 0) instead of raw NULLs for numeric aggregations
- The events table partitions on event_date; always include a date filter

This context travels with every agent interaction, so you write it once and it applies everywhere.

Step 8: Run Your First AI Analysis

Create a new project and open a fresh cell. Click the AI icon or type / to invoke the Notebook Agent. Start with a specific, well-scoped prompt.

Good prompt: "Show me the top 10 customers by total revenue in Q1 2026, broken down by product category, as a horizontal bar chart."

Weak prompt: "Analyze my data."

The agent will generate SQL, run it against your connected warehouse, and produce a visualization. You can iterate by asking follow-up questions in the same thread: "Now exclude trial accounts" or "Add a trend line for the last 12 months."

For statistical analysis, be explicit about the method: "Run a correlation analysis between marketing spend and new signups by month for 2025. Show the Pearson coefficient and a scatter plot."

Step 9: Share and Schedule

Once your analysis is ready, click "Publish" to create a shareable data app. Anyone with the link can view the results without needing a Hex account (on the free plan, projects must be public).

On paid plans, you can schedule notebooks to refresh on a cron schedule, keeping dashboards and reports up to date without manual intervention. The API also supports triggering runs programmatically from external tools.

Practical Tips for Better Results

Write descriptions before prompts. The quality of your table and column descriptions has more impact on agent accuracy than prompt engineering.

Use semantic models for any metric that involves joins across three or more tables. Without them, the agent may guess the wrong join path.

Keep workspace context under 500 words. The agent reads the entire file for every interaction, so bloated context dilutes the signal.

If you want to skip the warehouse setup entirely and just analyze files from a browser, tools like VSLZ handle the full pipeline from file upload to charts and statistical analysis without any configuration.

Test the agent against queries you already know the answer to. This builds confidence in the output and reveals any description gaps you need to fix.

FAQ

Is Hex free to use for data analysis?

Yes. The Hex Community plan is free and includes unlimited public projects, SQL and Python cells, and access to the AI Notebook Agent. You do not need a credit card to sign up. Paid plans start at $36 per editor per month for private projects and additional features.

What databases does Hex connect to?

Hex supports direct connections to Snowflake, Google BigQuery, Databricks, Amazon Redshift, PostgreSQL, Azure Synapse, Trino, Starburst, ClickHouse, and several other SQL-compatible warehouses. You configure the connection once and all projects in your workspace can query it.

How do I make the Hex AI agent more accurate?

Three configurations improve agent accuracy the most: endorsing trusted tables so the agent prioritizes them, adding detailed descriptions to tables and columns so the agent understands their purpose, and creating semantic models that define correct joins and metric calculations. Workspace context files also help by providing business-specific terminology and SQL conventions.

Can I use Hex without knowing SQL or Python?

Partially. The Notebook Agent generates SQL and Python from natural language prompts, so you can ask questions in plain English and get working code and charts. However, reviewing and editing the generated code is easier with basic SQL knowledge. For fully no-code data analysis, tools like VSLZ or Julius AI may be a better fit.

How does Hex compare to Jupyter notebooks?

Hex adds several features missing from Jupyter: built-in AI agents, native SQL cells, collaborative editing, version control, scheduled runs, and one-click publishing as interactive data apps. Jupyter is open source and more flexible for custom environments, but requires more setup for sharing and collaboration.

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Airbyte Cloud data integration platform
Guides

How to Set Up Airbyte Cloud for Data Syncing

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.

Arkzero Research · Apr 29, 2026