Topic Hub · Data Infrastructure

Connected data. One view, no analyst required.

The modern data stack — warehouse + ELT + dbt + BI — was built for analyst teams. Operating intelligence platforms consolidate the same components into one product for operators. This hub covers when you need the full modern stack vs. when an operating intelligence platform replaces it.

§ 01 · Definition

What is data infrastructure?

Data infrastructure is the layer of pipelines, storage, and modeling that moves data from source systems (CRM, billing, ads, ecommerce) into a unified, query-ready format. The "modern data stack" combines: ingestion (Fivetran, Airbyte), warehouse (Snowflake, BigQuery, Redshift), transformation (dbt), semantic layer (Cube, dbt metrics), and BI/operating layer.

§ 02 · Context

Why data infrastructure matters in 2026

  • 01

    Building the modern data stack in-house typically costs $300K–$800K in year-one tooling + engineering.

  • 02

    For most operators below $20M ARR or $30M GMV, an operating intelligence platform replaces 80% of the stack at 10% of the cost.

  • 03

    Data infrastructure debt — bad pipelines, broken models, undocumented metrics — is the silent tax that eats analyst productivity.

  • 04

    The semantic layer is becoming the new center of gravity: define metrics once, consume everywhere.

  • 05

    Reverse ETL closed the loop: data goes back from warehouse to operational tools (Salesforce, HubSpot), powering operator workflows.

§ 03 · Metrics

Core metrics & concepts

Every metric below has a definition page in the Fairview glossary — formulas, benchmarks, and worked examples.

Data Warehouse

A centralized storage system that collects, structures, and stores data from multiple business systems (CRM, E

Data Lake

Data lake = centralised raw-data repository on cheap object storage (S3, GCS), schema-on-read. Dominant 2010–2

Data Product

Data product = dataset treated as a managed product (owner, consumers, SLAs, versioning, lifecycle). Discovera

Data Catalog

Data catalog = searchable inventory of data assets with metadata, ownership, documentation, classification, li

Data Lineage

Data lineage = documented dependency graph of analytical data. Levels: table-level (most common), column-level

Data Mart

Data mart = subject-area subset of analytical data (sales/finance/marketing) modelled for one team's reporting

Semantic Layer

A translation layer that sits between a data warehouse and reporting tools, defining business metrics (revenue

Metric Store

Metric store = centralised metric definitions exposed via API to any consumer. Largely synonymous with headles

Headless BI

Headless BI = decoupled metric semantic layer that any consumer (dashboards, AI tools, reverse-ETL) can query

Embedded Analytics

Analytics capabilities built directly into a software product's interface, so users access dashboards, reports

Connected Data

Data from multiple business systems — CRM, finance, e-commerce, and marketing — unified into a single normaliz

CDC (Change Data Capture)

CDC = read database transaction logs (Postgres WAL, MySQL binlog) to capture inserts/updates/deletes increment

§ 09 · By industry

For your business model

§ 11 · FAQ

Frequently asked

What is the modern data stack?

The combination of: ingestion (Fivetran, Airbyte), cloud warehouse (Snowflake, BigQuery, Redshift), transformation (dbt), semantic layer (Cube), and BI tools (Looker, Mode). Originated ~2018; matured ~2022.

Do I need a data warehouse?

For most operators below $20M ARR or $30M GMV — no. An operating intelligence platform (Fairview) acts as the warehouse + transformation + presentation layer in one product. Above that scale, a dedicated warehouse makes sense.

What is ELT vs ETL?

ETL: extract data, transform it, load into warehouse. ELT: extract, load raw into warehouse, transform later in-warehouse. ELT is the modern default — it’s faster, cheaper, and gives downstream analysts more flexibility.

What is reverse ETL?

Moving data from the warehouse back into operational tools (Salesforce, HubSpot, Marketo). Tools like Hightouch and Census. Closes the loop so analyst-modeled segments can power campaigns and workflows.

What is a semantic layer?

A definition layer that names every business metric (revenue, MRR, churn) once and exposes them consistently to every downstream tool. Prevents the "every dashboard shows different numbers" problem.

Stop reading about data infrastructure. Start running on it.

Connect your stack. See data infrastructure in your data within 24 hours. No credit card required.

Editorial standards

Sources & references

Fairview maintains a public bibliography for every topic hub. Each citation below was verified at publication. We update sources every 12 months as new benchmark studies are released. See our editorial standards.

  1. 1 State of Analytics Engineering 2025 — dbt Labs, 2025. View source .
  2. 2 Modern Data Stack Annual Report — a16z / Future, 2024. View source .
  3. 3 Snowflake Data Cloud Report — Snowflake, 2025. View source .

Fairview cites primary sources only — government data, academic research, industry benchmarks from named publishers, and official vendor documentation. See our editorial standards.