Guides

How to Set Up the Hex AI Agent (2026 Guide)

Arkzero ResearchApr 24, 20268 min read

Last updated Apr 24, 2026

Hex is a collaborative analytics platform that combines SQL, Python, and AI agents in a single workspace. Getting useful results from the Hex Agent requires four setup steps before asking a question: a scoped data connection, endorsed tables, column descriptions, and a workspace context file. With those configured, the agent generates accurate SQL, surfaces charts from existing projects, and since the April 2026 update, builds a persistent user memory that improves responses over time.
Hex logo for article: How to Set Up the Hex AI Agent (2026 Guide)

What Hex Is and Why Setup Determines Agent Quality

Hex is a cloud analytics workspace where data teams write SQL and Python, build interactive visualizations, and publish shareable apps. What separates it from tools like Google Colab or Jupyter is how AI is embedded: the Hex Agent runs inside the notebook environment, can inspect your live schema, generate and execute code, produce charts, and pull from existing projects without any external tool calls.

The catch is that the agent starts with no business knowledge. It knows SQL. It knows how to build a chart. It does not know that your "accounts" table refers to B2B customers, that revenue is stored in cents, or that there is a staging schema it should never touch. Without configuration, it makes reasonable-sounding assumptions that produce technically valid queries answering the wrong question.

The following steps take roughly 30 to 60 minutes for a warehouse with a few dozen tables. The payoff is an agent that gets useful answers right the first time.

Step 1: Connect and Scope Your Data

Go to Settings > Data Connections > Add New Connection. Hex supports Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, and about 30 other integrations. Enter your warehouse credentials and run the connection test.

After connecting, narrow the scope. A data connection pointed at 400 tables gives the agent too much to reason about and increases the chance it queries the wrong one. Two ways to scope it down:

Create a dedicated read-only warehouse role with SELECT access on only the schemas your team uses. Use those credentials in the Hex connection. This enforces the boundary at the warehouse level and is the cleaner long-term solution.

If warehouse role creation is not in scope, use Hex's schema filter instead. Under the connection settings, add a regex pattern like ^(analytics|reporting) to surface only matching schemas. This takes effect on the next schema sync.

Turn on scheduled schema refreshes under connection settings. Hex re-syncs the table list on your chosen interval so the agent always reflects current schema state without a manual trigger.

Step 2: Endorse the Tables Agents Should Query

Open the Data browser (the database icon in the left sidebar). Navigate to each schema and table that represents your source of truth and click the Endorse button. Endorsement applied at the schema level propagates to every table underneath automatically.

Endorsed objects are ranked first in all AI features. When the agent chooses which tables to query, it prioritizes endorsed data. Unendorsed staging tables, test schemas, and deprecated views remain visible for human browsing but stay deprioritized in AI context.

For tables you want completely hidden from the agent but still visible for manual queries, use the Include/Exclude for AI toggle on each object. Excluding a staging schema takes one click and prevents the agent from joining to intermediate tables instead of final marts.

Step 3: Write Column and Table Descriptions

Column descriptions are the highest-leverage step. The agent reads them every time it generates SQL, so description quality directly determines output quality.

If your team uses dbt, the fastest path is connecting the dbt Cloud integration under Settings > Integrations. Hex syncs all existing dbt descriptions on every schema refresh. A warehouse with complete dbt docs is ready for AI queries without any additional manual work.

For manual documentation, click any table in the Data browser and edit the description fields directly. What to write:

For tables, describe the business purpose and intended use. A description like "Confirmed orders after payment processing. Use for revenue and order volume reporting. Do not use for cart abandonment analysis" tells the agent both what the table is for and what it is not.

For columns with low cardinality, enumerate the values. If a status column contains pending, confirmed, and shipped, write those exact values in the description. The agent uses them for WHERE clause filters instead of guessing.

For numeric columns, specify the unit. If revenue is stored in cents, write "Revenue in cents. Divide by 100 for dollar amounts." This eliminates a specific, common calculation error that a one-line description fixes permanently.

For foreign keys, note the join target. If account_id should always join to accounts.id and never to users.account_id, write that. Ambiguous join paths are the most common source of incorrect agent queries in practice.

Step 4: Configure a Workspace Context File

The workspace context file is a markdown document that applies to every agent interaction across the entire workspace. It handles business concepts that cannot be expressed in table descriptions alone.

Admins write the context file under Settings > AI and Agents > Context. The April 2026 Hex update added GitHub Actions integration: teams can now manage the context file in a repository and sync it to Hex automatically on merge, with preview links and validation warnings generated on each pull request. This lets data platform teams treat context curation like code.

What to put in the context file:

Business terminology. If your company uses "accounts" for B2B customers and "members" for end users, state that directly. The agent defaults to common English meanings without this.

Metric definitions. If Monthly Active Users is defined as "any member who triggered at least one session event in the last 30 calendar days, excluding accounts flagged as internal test accounts," write exactly that. Without it, the agent invents a definition that may differ from how your finance team calculates the same number.

Preferred join patterns. Name the date spine table if you have one. Note which schemas are authoritative for each domain. If there is a dimension table that should be included in most revenue queries, name it here.

Exclusions. If events_raw is an unprocessed Kafka stream and should never appear in analysis queries, include a line that says so. The agent will route around it.

Step 5: Understand User Memory

As of April 14, 2026, Hex automatically builds a user memory profile as you interact with the agent. Memory persists across sessions, so early interactions teach the agent which tables you work with, what types of questions you ask, and what you tend to focus on.

After five to ten sessions, you can ask open-ended questions like "What is worth investigating in this dataset?" and the agent will suggest directions based on your history. Analysts running recurring reports benefit the most: the agent learns the pattern and skips re-establishing context each time.

User memory is on by default. Review or clear your profile under account settings if you want a reset.

Step 6: Run a Test Query and Iterate on Descriptions

Open a new Thread from the left sidebar. Ask a plain English question. Hex shows the SQL it plans to run before executing, which makes verification fast: check the joins, scan the filters, confirm the metric calculation.

If the output is wrong, the most reliable fix is improving the relevant column or table description rather than iterating on the prompt. A better description applies to every future query and every other user in the workspace. Prompt iteration only fixes the current session.

For chart output, ask "Show this as a bar chart by month" after the agent returns a result. It generates a visualization cell in the notebook. To pull a chart from an existing project rather than regenerating one, use the @ symbol to mention the project. As of April 2026, the agent can render individual cells from a referenced project inline in the Thread, which is useful for trusted metrics you do not want to recalculate each time.

For teams working with flat files or ad hoc uploads who want to skip warehouse connectivity entirely, VSLZ can produce charts and statistical summaries from an uploaded file with a single prompt and no connection configuration.

What Gets Better Over Time

The setup described above is a one-time investment. Once the connection is scoped, tables are endorsed, and descriptions are written, maintenance is incremental: update descriptions when columns change, add new schemas to the endorsed set as they mature, and refine the context file as business terminology evolves.

The teams getting the most consistent output from AI analytics tools in 2026 are not the ones with the best prompts. They are the ones with the best-labeled data. Context curation is increasingly treated as a core data platform responsibility, sitting alongside documentation and test coverage in how data engineering teams measure the health of their warehouse.

The April 14, 2026 Hex changelog summarized this shift directly: "context isn't a monolith, it's a heterogenous puzzle. Every time we interact with data we're creating more context." User memory, semantic model references, and GitHub-synced context files are all mechanisms for capturing that context incrementally rather than trying to define everything upfront.

FAQ

How do I connect Hex to Snowflake?

Go to Settings > Data Connections > Add New Connection in Hex. Select Snowflake and enter your account identifier, warehouse name, database, schema, username, and password (or key-pair authentication). Run the connection test to confirm access. After connecting, use Hex's schema filter or a dedicated read-only Snowflake role to limit the agent to the schemas your team uses.

What does endorsing a table in Hex do?

Endorsing a table marks it as approved or trusted data. The Hex Agent prioritizes endorsed tables when generating SQL queries. You can endorse at the database, schema, or table level; child objects inherit the endorsement automatically. Unendorsed tables remain visible in the Data browser but are deprioritized in AI context, which reduces the chance of the agent querying staging or test data.

What should I put in the Hex workspace context file?

The workspace context file is a markdown document visible to all Hex AI agents. Include business terminology (e.g., what 'accounts' means at your company), metric definitions (how MAU is calculated), preferred join patterns, and any tables or schemas the agent should avoid. Keep it factual and specific. Vague instructions like 'be accurate' have no effect; concrete definitions like 'revenue is stored in cents' directly improve output quality.

How does Hex user memory work?

As of April 2026, Hex automatically builds a user memory profile as you interact with the agent. The agent learns which tables you use, what types of analyses you run, and your preferences over time. Memory persists across sessions. You can review or clear your memory under account settings. After several sessions, you can ask open-ended questions like 'what should I be looking at this week?' and the agent responds based on your history.

Can Hex AI generate charts or only SQL queries?

Hex AI can generate both SQL queries and visualizations. After producing a query result, you can ask the agent to create a chart in natural language, for example 'show this as a monthly bar chart.' The agent generates a visualization cell in the notebook. In Threads, the agent can also pull individual charts from existing projects and render them inline, rather than regenerating a visualization from scratch.

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Airbyte Cloud data integration platform
Guides

How to Set Up Airbyte Cloud for Data Syncing

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.

Arkzero Research · Apr 29, 2026