Guides

How to Set Up Airbyte Cloud for Data Syncing

Arkzero ResearchApr 29, 20268 min read

Last updated Apr 29, 2026

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.
Airbyte Cloud data integration platform

Getting your business data into one place used to require a data engineer, a Kubernetes cluster, or both. Airbyte Cloud eliminates that. Sign up, pick a source, pick a destination, and your first sync runs in under 30 minutes. The cloud version handles infrastructure, connector updates, and scaling automatically, so operations teams and analysts can move data without touching a server.

What Airbyte Does

Airbyte is an open-source data integration platform that moves data from sources to destinations using pre-built connectors. A source is any system where your data lives: a SaaS app like Salesforce or Stripe, a database like PostgreSQL or MySQL, a file store like S3, or a spreadsheet tool like Google Sheets. A destination is where you want the data to go, typically a cloud data warehouse like BigQuery, Snowflake, or Redshift.

The platform performs ELT (Extract, Load, Transform), which means it copies data as-is into your destination and lets your SQL or analytics layer handle transformation. This differs from the older ETL pattern, which required transforming data in transit. ELT keeps the raw data intact and separates concerns cleanly: Airbyte handles movement, your warehouse handles modeling.

As of 2026, Airbyte maintains over 600 connectors. Community contributions add new ones weekly. The GitHub repository has over 16,000 stars, and the platform is used by organizations including Cisco, Red Bull, and Typeform to power production data stacks.

Airbyte Cloud vs. Self-Hosted

Airbyte offers two deployment options: Airbyte Cloud, which is fully managed, and self-hosted, which runs on Docker Compose or Kubernetes on your own infrastructure.

Self-hosted is the right choice if you need full data residency control, have strict compliance requirements, or already run Kubernetes. It requires a machine with at least 8 GB of RAM and comfort with Docker Compose. The default self-hosted setup runs all services locally, including a Postgres instance for the internal catalog.

Airbyte Cloud requires none of that. There is no Docker to configure, no server to provision, and no updates to manage. Airbyte handles availability, connector version upgrades, and autoscaling automatically. The cloud version starts with a free 30-day trial and pricing after that is usage-based, starting at roughly $10 per month for light workloads. For most small teams and startups syncing a handful of sources daily, the cloud version is the faster and cheaper path.

This guide covers Airbyte Cloud.

Step 1: Create Your Account

Go to airbyte.com and click "Get started free." You can sign up with a GitHub or Google account. No credit card is required for the 30-day trial.

Once logged in, Airbyte places you in a default workspace. A workspace is an isolated environment for your connections. Most teams use one workspace. Larger organizations with separate staging and production environments may create additional workspaces to keep configurations distinct.

Step 2: Add a Source

A source is the system you want to pull data from. In the Airbyte Cloud UI, click "Sources" in the left sidebar, then "New source."

The connector catalog opens. Use the search bar to find your source by name. If your company uses HubSpot for CRM, search "HubSpot." For transactional data from Stripe, search "Stripe." The catalog lists connectors organized by type: SaaS applications, databases, file stores, and developer tools.

After selecting a connector, Airbyte prompts for credentials. What it needs depends on the source. For HubSpot, you authenticate via OAuth by clicking "Authenticate with HubSpot" and granting access in your browser. For Postgres databases, you enter the host, port, username, password, and database name. For Google Sheets, you share the spreadsheet with a service account email that Airbyte provides.

After entering credentials, click "Set up source." Airbyte tests the connection and returns a list of available streams. A stream is one logical data set within the source. For HubSpot, streams include Contacts, Companies, Deals, and Email Events. For Stripe, streams include Charges, Customers, Invoices, and Subscriptions.

This step takes under five minutes for most SaaS sources.

Step 3: Add a Destination

A destination is where Airbyte writes the synced data. Click "Destinations" in the left sidebar, then "New destination."

Common choices for teams getting started include BigQuery (Google Cloud's serverless warehouse with a free tier), Snowflake (enterprise-grade, pay-per-query), and Postgres (a relational database you can run on Supabase or Neon for free). BigQuery's free tier covers 10 GB of storage and 1 TB of queries per month, which is enough to run a real analytics operation for a small team.

To add BigQuery as a destination, you need a Google Cloud project and a service account with BigQuery Data Editor and Job User roles. Airbyte's documentation walks through creating the service account and downloading the JSON key file. Paste the key contents into the Airbyte destination form, specify a dataset name, and click "Set up destination."

For teams not yet using a warehouse, Postgres on Supabase is the fastest path: create a free Supabase project, copy the connection string from the Supabase dashboard, and paste it into the Airbyte destination form.

Step 4: Configure Your Connection

With a source and destination defined, create a connection to link them. Airbyte guides you through three configuration decisions.

Sync frequency. Choose how often Airbyte runs the sync: every 24 hours, every 6 hours, hourly, or on-demand. For most operational reporting use cases, a daily sync at off-peak hours is sufficient. More frequent schedules increase credit consumption.

Stream selection. Pick which streams to sync. Enable only the streams you need. Syncing every available table from a Salesforce instance wastes storage and slows pipelines. A typical sales reporting setup needs Accounts, Contacts, Opportunities, and Activities.

Sync mode. Each stream can run in one of several modes. Full refresh replaces the destination table entirely on each run, which works well for small, slow-changing data sets. Incremental append adds only new or changed records without modifying existing rows, which is more efficient for large tables like event logs or transactions. Incremental deduped history updates rows in place, keeping one record per primary key, which is the right mode for entity tables like Customers or Products where attributes change over time.

Click "Set up connection." Airbyte triggers a test sync to verify the full pipeline end to end.

Step 5: Run Your First Sync

After the connection is created, the first sync runs automatically. Completion time depends on the source and data volume. A HubSpot full refresh covering 10,000 contacts and 5,000 deals typically finishes in two to five minutes. A Stripe sync covering a full year of transaction history may take 15 to 30 minutes.

The connection dashboard shows sync status in real time: records extracted, records loaded, bytes transferred, and any errors. Airbyte logs each sync run with full details, so failed syncs are straightforward to diagnose. Common failure causes include expired OAuth tokens, changed credentials, or schema drift when an upstream source adds or removes a field.

Once the sync completes, the destination tables are populated and ready to query. In BigQuery, tables appear under the dataset you specified. In Postgres, they appear as new tables in the target schema.

What to Do With the Data

After your first sync, the data sits in your destination as raw tables. From here, most teams take one of two paths: direct analysis or a transformation layer.

For direct analysis, connect a BI tool to your warehouse. Metabase and Looker Studio both connect to BigQuery and Postgres in under five minutes and let non-technical teammates build dashboards on the synced data without writing SQL.

For teams that want to query the data in plain English without setting up a dashboard, tools like VSLZ connect directly to your data source and return charts and statistical summaries from a single prompt.

For teams that want modeled data, dbt Core sits on top of Airbyte's raw tables and transforms them into clean, typed, tested models. This is the standard modern data stack: Airbyte for ingestion, dbt for transformation, and a BI tool for presentation.

Common Mistakes to Avoid

Syncing too many streams on launch is the most common setup error. Start with three or four tables that answer a specific business question: customer acquisition, pipeline health, or revenue by segment. Add streams incrementally as the team builds confidence in the pipeline.

Ignoring schema changes is the second mistake. SaaS vendors add and remove fields without warning. Airbyte detects schema changes and can alert you or handle them automatically by adding columns and nullifying removed fields. Configure schema change handling explicitly in the connection settings to avoid silent failures.

Skipping the incremental sync mode for large tables is the third mistake. Running full refresh on a table with 500,000 rows every hour is expensive and slow. Set large transaction and event tables to incremental mode from the start.

Mixing environments in one workspace creates maintenance overhead later. Create separate Airbyte connections for staging and production sources, even if the destinations are different schemas within the same warehouse.

FAQ

How do I connect Salesforce to BigQuery using Airbyte?

In Airbyte Cloud, add Salesforce as a source by entering your Salesforce credentials or authenticating via OAuth. Then add BigQuery as a destination by providing your Google Cloud project ID, dataset name, and a service account JSON key with BigQuery Data Editor and Job User roles. Create a connection between the two, select the Salesforce objects you want to sync (Accounts, Contacts, Opportunities, etc.), choose incremental deduped history as the sync mode for entity tables, and run the connection. Your first sync will populate the selected tables in your BigQuery dataset within minutes.

Is Airbyte Cloud free to use?

Airbyte Cloud offers a free 30-day trial with no credit card required. After the trial, pricing is usage-based starting at approximately $10 per month, which includes a small credit allocation. Additional data synced beyond the included credits is billed at a per-record rate. For light workloads such as a few sources syncing daily, monthly costs are typically under $50. The self-hosted version of Airbyte is free to run but requires you to manage the infrastructure.

What is the difference between full refresh and incremental sync in Airbyte?

Full refresh replaces the entire destination table with fresh data on every sync run. This is simple and reliable but expensive for large tables, since all records are re-transferred every time. Incremental append adds only records created or updated since the last sync, which is faster and cheaper. Incremental deduped history also adds only new and changed records but maintains one row per primary key in the destination, so updated records replace old ones instead of accumulating duplicates. Use full refresh for small static tables, incremental append for event logs, and incremental deduped history for entity tables like customers and products.

Can I use Airbyte without a data warehouse?

Yes. Airbyte supports Postgres as a destination, which you can run for free on Supabase or Neon without setting up a dedicated warehouse. This is a practical starting point for teams that want to centralize data but are not yet ready to pay for Snowflake or BigQuery. The trade-off is that Postgres is a transactional database, not an analytical one, so query performance on large analytical workloads will be slower than a purpose-built warehouse. Most teams start with Postgres on Supabase and migrate to BigQuery or Snowflake once data volumes grow.

How do I fix a failed Airbyte sync?

In Airbyte Cloud, click on the connection that failed to open the sync history. Each run shows a status log with the exact error message. The most common causes are expired OAuth tokens (fix by clicking re-authenticate on the source), changed database credentials (update the source configuration), network connectivity issues (check firewall rules for self-hosted destinations), and schema drift where a source added or removed fields (configure schema change handling to auto-propagate or pause on change). After fixing the root cause, trigger a manual sync from the connection dashboard to verify the fix before relying on the next scheduled run.

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Metabase logo for self-serve analytics setup guide
Guides

How to Set Up Metabase for Self-Serve Analytics

Metabase is an open-source business intelligence tool that lets non-technical teams query databases, build dashboards, and share data insights without writing SQL. You can be up and running in under 30 minutes using Metabase Cloud, or self-host it with Docker on any server. Once connected to your database, the question builder lets anyone filter, group, and visualize data through a point-and-click interface. Automated email and Slack digests replace ad-hoc reporting requests.

Arkzero Research · Apr 29, 2026