Data & Analytics 14 min read

Fact Table vs Dimension Table: The Complete Data Modeling Guide

Fact tables store measurable events. Dimension tables describe them. Learn the differences, types, SQL structures, and how dbt implements dimensional modeling.

Siddharth Gangal

TL;DR

  • Fact tables store measurable business events (orders, payments, clicks). Every row is an occurrence; numeric columns are the measures you aggregate.
  • Dimension tables store descriptive context (customers, products, dates). They let you slice and filter the measures in your fact table.
  • Three fact table types: transaction (one row per event), periodic snapshot (one row per period), and accumulating snapshot (one row per process lifecycle).
  • Surrogate keys decouple your warehouse model from source system IDs, enabling SCD history and stable joins.
  • In dbt, fact models are prefixed fct_ and dimension models dim_. Incremental materialization handles high-volume fact tables efficiently.

Every functioning data warehouse is built on a distinction that is easy to state and takes years to apply well: some tables record what happened, and other tables describe who or what was involved. The first category is fact tables. The second is dimension tables. Together, they form the dimensional model — the organizing principle that Ralph Kimball codified in the 1990s and that remains the dominant approach to analytical data modeling in 2026.

This guide covers the concept from first principles, walks through SQL structures and worked examples, compares the three main fact table types, explains surrogate keys, degenerate dimensions, and conformed dimensions, and shows how modern tooling — specifically dbt — implements these patterns in production.

The Core Distinction

A fact table records a measurable business event. Each row is a discrete occurrence — a sale, a page view, a support ticket, a payment received. The numeric columns in a fact table are the measures you want to aggregate: revenue, quantity, duration, count. The non-numeric columns are foreign keys that point to dimension tables.

A dimension table describes the entities involved in those events. Customers, products, employees, dates, store locations — all are dimensions. Dimension tables are wider than fact tables, containing many descriptive attributes, and they change slowly compared to the high-volume row accumulation in fact tables.

The relationship is always the same: the fact table sits at the center of the model, and dimension tables surround it. Each dimension table joins to the fact table on a key column. This structure — one central fact surrounded by several dimensions — is a star schema, the most common implementation of dimensional modeling.

Side-by-Side Comparison

Characteristic Fact Table Dimension Table
What it storesMeasurable events and transactionsDescriptive attributes of entities
Typical row countVery high (millions to billions)Low to moderate (thousands to millions)
Column typesForeign keys + numeric measuresSurrogate key + descriptive text/dates
ShapeTall and narrowShort and wide
Update frequencyAppend-only (new rows added continuously)Infrequent updates (slowly changing)
Primary keySurrogate key or composite of FKsSurrogate key (integer)
dbt naming conventionfct_orders, fct_paymentsdim_customers, dim_products
Role in queryProvides the measures to aggregateProvides the GROUP BY and WHERE attributes

SQL Structure: A Worked Example

Consider an e-commerce order model. The fact table captures individual order line items. The dimension tables describe the customer, the product, and the date.

Dimension table: dim_customers

CREATE TABLE dim_customers (
  customer_key     INT           NOT NULL PRIMARY KEY,  -- surrogate key
  customer_id      VARCHAR(50)   NOT NULL,              -- natural key from source
  full_name        VARCHAR(200),
  email            VARCHAR(200),
  country          VARCHAR(100),
  city             VARCHAR(100),
  customer_segment VARCHAR(50),                         -- e.g. 'VIP', 'Standard'
  acquisition_channel VARCHAR(100),
  first_order_date DATE,
  is_active        BOOLEAN,
  row_effective_date  DATE       NOT NULL,              -- SCD2 tracking
  row_expiry_date     DATE,
  is_current_row      BOOLEAN    NOT NULL DEFAULT TRUE
);

Dimension table: dim_products

CREATE TABLE dim_products (
  product_key      INT           NOT NULL PRIMARY KEY,  -- surrogate key
  product_id       VARCHAR(50)   NOT NULL,              -- natural key
  product_name     VARCHAR(300),
  sku              VARCHAR(100),
  brand            VARCHAR(150),
  category         VARCHAR(100),
  subcategory      VARCHAR(100),
  unit_cost        NUMERIC(12,4),
  list_price       NUMERIC(12,4),
  is_active        BOOLEAN
);

Dimension table: dim_date

CREATE TABLE dim_date (
  date_key         INT           NOT NULL PRIMARY KEY,  -- surrogate key: 20240115
  full_date        DATE          NOT NULL,
  day_of_week      VARCHAR(10),
  day_number       SMALLINT,
  week_number      SMALLINT,
  month_number     SMALLINT,
  month_name       VARCHAR(15),
  quarter          SMALLINT,
  year             SMALLINT,
  is_weekend       BOOLEAN,
  is_holiday       BOOLEAN
);

Fact table: fct_order_lines

CREATE TABLE fct_order_lines (
  order_line_key   BIGINT        NOT NULL PRIMARY KEY,  -- surrogate key
  order_id         VARCHAR(50)   NOT NULL,              -- degenerate dimension
  order_date_key   INT           NOT NULL REFERENCES dim_date(date_key),
  customer_key     INT           NOT NULL REFERENCES dim_customers(customer_key),
  product_key      INT           NOT NULL REFERENCES dim_products(product_key),
  -- measures
  quantity         INT           NOT NULL,
  unit_price       NUMERIC(12,4) NOT NULL,
  discount_amount  NUMERIC(12,4) NOT NULL DEFAULT 0,
  gross_revenue    NUMERIC(12,4) NOT NULL,
  net_revenue      NUMERIC(12,4) NOT NULL,
  cogs             NUMERIC(12,4) NOT NULL,
  gross_profit     NUMERIC(12,4) NOT NULL
);

Notice what is in each table. The fact table holds only foreign keys plus numbers you can meaningfully sum, average, or count. Every descriptive attribute — customer name, product category, calendar month — lives in the dimension tables. A query that asks "what was gross revenue by product category in Q3?" becomes a join of fct_order_lines to dim_products and dim_date, filtering on quarter = 3 and grouping by category.

The Three Types of Fact Tables

Kimball identifies four fact table grain types. Three are used in most production warehouses. The choice determines how rows accumulate and what analytical questions the table can answer.

1. Transaction Fact Tables

Transaction fact tables are the most common and the highest grain. Each row represents a single discrete event — one order line, one payment, one ad click, one support ticket resolved. Rows are append-only: new events add new rows, and existing rows are never updated.

Transaction facts answer questions about individual events: "How many orders came through paid social on Tuesday?" or "What was the average order value for customers in Germany last month?" Because every event is a separate row, you can calculate counts, sums, and averages at any level of granularity by filtering and grouping.

The downside of transaction facts: they cannot answer questions about state or duration. To know how many orders are currently open, you cannot just query the transaction table — you need either a snapshot or an accumulating snapshot.

2. Periodic Snapshot Fact Tables

Periodic snapshot tables record the state of a metric at a fixed interval — typically daily, weekly, or monthly. Each row represents one entity at one point in time: one customer's account balance on January 31, one product's inventory count on the last day of each month, one subscription's MRR on the first of each month.

-- Example: monthly MRR snapshot per subscription
CREATE TABLE fct_subscription_monthly_snapshot (
  snapshot_key          BIGINT  NOT NULL PRIMARY KEY,
  snapshot_date_key     INT     NOT NULL REFERENCES dim_date(date_key),
  customer_key          INT     NOT NULL REFERENCES dim_customers(customer_key),
  subscription_id       VARCHAR(50) NOT NULL,           -- degenerate dimension
  plan_key              INT     NOT NULL REFERENCES dim_plans(plan_key),
  -- measures: state at snapshot date
  mrr                   NUMERIC(12,4),
  seat_count            INT,
  days_since_last_login INT,
  health_score          NUMERIC(5,2)
);

Periodic snapshots are ideal for tracking trends in slowly changing quantities — subscription MRR, pipeline value, account health scores. The table grows predictably: one row per entity per period.

3. Accumulating Snapshot Fact Tables

Accumulating snapshot tables model a business process with defined stages that each instance passes through — a sales opportunity moving through pipeline stages, an order progressing from placed to fulfilled to delivered, a loan application advancing from submission to approval to disbursement.

Each row represents one instance of the process. The row is created when the process starts and updated as it advances — which makes accumulating snapshots unique among fact table types. Multiple date foreign keys capture when each milestone was reached.

-- Example: order fulfillment lifecycle
CREATE TABLE fct_order_fulfillment (
  order_key             BIGINT  NOT NULL PRIMARY KEY,
  order_id              VARCHAR(50) NOT NULL,           -- degenerate dimension
  customer_key          INT     NOT NULL REFERENCES dim_customers(customer_key),
  -- milestone date keys (NULL until milestone is reached)
  order_placed_date_key   INT   REFERENCES dim_date(date_key),
  payment_confirmed_key   INT   REFERENCES dim_date(date_key),
  warehouse_picked_key    INT   REFERENCES dim_date(date_key),
  shipped_date_key        INT   REFERENCES dim_date(date_key),
  delivered_date_key      INT   REFERENCES dim_date(date_key),
  returned_date_key       INT   REFERENCES dim_date(date_key),
  -- measures: lags between milestones
  days_to_payment         INT,
  days_to_ship            INT,
  days_to_deliver         INT,
  order_total             NUMERIC(12,4)
);

Accumulating snapshots answer duration and process-efficiency questions: "What percentage of orders shipped within 24 hours of payment?" or "Where in the fulfillment process do delays accumulate?" They require an update mechanism rather than pure append, which makes them slightly more complex to maintain but uniquely suited to pipeline and workflow analysis.

Surrogate Keys vs Natural Keys

Every dimension table needs a primary key. The choice between a surrogate key and the source system's natural key has significant downstream consequences.

A natural key is the identifier that exists in the source system: customer_id from your CRM, product_sku from your inventory system, user_uuid from your application database. Using natural keys directly seems simpler — there is one less column to manage, and you can trace a dimension row back to the source without a lookup.

A surrogate key is a system-generated integer assigned by the warehouse during the ETL/ELT process, independent of any source value. It is typically a sequential integer or a hash.

Kimball recommends surrogate keys for three reasons:

Source system insulation. When a source system migrates, resets its ID sequences, or is replaced entirely, natural keys change. Surrogate keys in the warehouse do not. Your historical fact data remains joinable to the correct dimension row even after the source changes.

Slowly changing dimension (SCD) support. When a customer changes their address or a product changes its category, you may want to preserve the old version for historical accuracy. SCD Type 2 handles this by adding a new dimension row with the updated attributes and a new surrogate key, while keeping the old row intact. The fact table continues to point to the surrogate key that was current when each transaction occurred — preserving historical accuracy automatically.

Join performance. Integer joins are faster than string joins on large tables. A surrogate key of INT or BIGINT joins significantly faster than a VARCHAR natural key, especially at hundreds of millions of rows.

Best Practice

Always carry the natural key as a non-key column in the dimension table alongside the surrogate key. This allows you to trace a dimension record back to its source and to join incoming source data during incremental loads without losing the ability to audit lineage.

Degenerate Dimensions

A degenerate dimension is a dimension that lives inside the fact table rather than in a separate table. It is typically a transaction identifier from the source system — an order number, invoice number, shipment ID, or ticket number — that carries meaningful context but has no additional attributes that justify a full dimension table.

In the fct_order_lines example above, order_id is a degenerate dimension. It is not a measure — you would never sum order IDs. But it is not descriptive enough to warrant a dim_orders table, because the order itself is already the grain of the fact table. Keeping order_id in the fact row allows analysts to group by it, filter for a specific order, or audit individual transactions without an additional join.

Common degenerate dimensions:

  • Order number or invoice number in a sales fact table
  • Claim number in an insurance claim fact table
  • Ticket ID in a support event fact table
  • Session ID in a web analytics event fact table
  • Shipment tracking number in a logistics fact table

Conformed Dimensions

A conformed dimension is a dimension table that is used — without modification — across multiple fact tables. It is one of Kimball's central architectural principles, and it is what makes cross-process analysis possible in a data warehouse.

The date dimension is the most universal example. A well-built dim_date table contains every date in your analytical range, pre-computed with attributes like day of week, month name, quarter, fiscal period, and holiday flag. This table is joined to sales facts, support ticket facts, marketing event facts, and inventory snapshot facts. Because all of these fact tables use the same dim_date, you can compare daily order volume against daily support ticket volume against daily ad spend in a single query — they all share the same calendar structure.

-- Cross-process query using a conformed date dimension
SELECT
  d.year,
  d.month_number,
  d.month_name,
  SUM(o.gross_revenue)        AS monthly_revenue,
  COUNT(DISTINCT o.order_id)  AS orders,
  SUM(t.tickets_opened)       AS support_tickets
FROM fct_order_lines o
JOIN dim_date d      ON o.order_date_key = d.date_key
JOIN fct_support_daily t ON t.date_key = d.date_key
GROUP BY d.year, d.month_number, d.month_name
ORDER BY d.year, d.month_number;

Customer dimensions can also be conformed — the same dim_customers table is shared between order facts, subscription snapshot facts, and support ticket facts. This makes it possible to ask "what was the average support ticket volume for customers acquired through paid social?" in a single query, because all three fact tables share the same customer dimension with the same surrogate keys.

Non-conformed dimensions are fine for process-specific attributes (a dim_shipping_carriers table used only by logistics facts) but conformed dimensions are what enable the enterprise data bus — the integrated, multi-subject-area warehouse that Kimball envisioned.

Star Schema vs Snowflake Schema

The star schema puts the fact table at the center with denormalized dimension tables radiating outward — one join gets you from the fact to any attribute. The snowflake schema normalizes dimensions into sub-tables: instead of one dim_products table containing brand, category, and subcategory, you have dim_products, dim_categories, and dim_brands as separate tables joined in a hierarchy.

Attribute Star Schema Snowflake Schema
Dimension structureSingle denormalized tableMultiple normalized sub-tables
Join depthOne join from fact to dimensionMultiple joins through dimension hierarchy
Query performanceFaster — fewer joinsSlower — more joins required
StorageHigher — repeated attribute valuesLower — normalized, no repetition
BI tool compatibilityExcellent — tools navigate it nativelyRequires manual relationship setup
Kimball recommendationPreferred for analyticsAcceptable for storage-constrained environments

In practice, most modern analytical warehouses use star schemas for their marts. Cloud storage is inexpensive enough that denormalization is rarely a concern, and the query simplicity is a significant advantage for both analysts and BI tools.

How dbt Implements Fact and Dimension Tables

dbt (data build tool) has become the standard for in-warehouse transformation and is the most common implementation layer for dimensional models. It does not change the concepts — the same Kimball principles apply — but it provides structure, testing, and dependency management.

File structure and naming conventions

In a conventional dbt project, models are organized by layer. Dimension and fact models typically live in models/marts/, with staging models in models/staging/ and intermediate models in models/intermediate/. The fct_ and dim_ prefixes are conventions enforced by the team, not dbt itself.

models/
  staging/
    stg_shopify__orders.sql
    stg_shopify__order_lines.sql
    stg_hubspot__contacts.sql
  intermediate/
    int_orders_enriched.sql
  marts/
    core/
      dim_customers.sql
      dim_products.sql
      dim_date.sql
      fct_order_lines.sql
    finance/
      fct_subscription_monthly_snapshot.sql

A simple dimension model in dbt

-- models/marts/core/dim_customers.sql
{{ config(materialized='table') }}

WITH source AS (
  SELECT * FROM {{ ref('stg_shopify__customers') }}
),

renamed AS (
  SELECT
    {{ dbt_utils.surrogate_key(['customer_id']) }} AS customer_key,
    customer_id,
    first_name || ' ' || last_name               AS full_name,
    email,
    country,
    city,
    tags                                          AS customer_segment,
    created_at::DATE                              AS first_order_date,
    NOT accepts_marketing                         AS is_opted_out
  FROM source
)

SELECT * FROM renamed

Incremental fact tables

Large fact tables are almost always materialized as incremental models. Instead of rebuilding the entire table on every dbt run, an incremental model appends only rows that are new since the last run. This reduces compute time from hours to minutes for tables with hundreds of millions of rows.

-- models/marts/core/fct_order_lines.sql
{{
  config(
    materialized='incremental',
    unique_key='order_line_key',
    on_schema_change='sync_all_columns'
  )
}}

WITH orders AS (
  SELECT * FROM {{ ref('stg_shopify__order_lines') }}
  {% if is_incremental() %}
  WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
  {% endif %}
),

customers AS (
  SELECT customer_key, customer_id
  FROM {{ ref('dim_customers') }}
),

products AS (
  SELECT product_key, product_id
  FROM {{ ref('dim_products') }}
),

dates AS (
  SELECT date_key, full_date
  FROM {{ ref('dim_date') }}
)

SELECT
  {{ dbt_utils.surrogate_key(['o.order_line_id']) }} AS order_line_key,
  o.order_id,
  d.date_key                                         AS order_date_key,
  c.customer_key,
  p.product_key,
  o.quantity,
  o.unit_price,
  o.discount_amount,
  o.quantity * o.unit_price                          AS gross_revenue,
  o.quantity * o.unit_price - o.discount_amount      AS net_revenue,
  o.updated_at
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products  p ON o.product_id  = p.product_id
JOIN dates     d ON o.order_date::DATE = d.full_date

Testing dimensional models

dbt's schema.yml testing is how teams enforce dimensional modeling constraints without writing custom SQL. Primary key uniqueness and foreign key referential integrity are the two tests every mart model should have.

# models/marts/core/schema.yml
models:
  - name: fct_order_lines
    columns:
      - name: order_line_key
        tests:
          - unique
          - not_null
      - name: customer_key
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_key
      - name: product_key
        tests:
          - not_null
          - relationships:
              to: ref('dim_products')
              field: product_key
      - name: gross_revenue
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"

Common Mistakes When Building Dimensional Models

Putting measures in the dimension table. If you find yourself summing or averaging a column in a dimension table, it belongs in the fact table. Dimensions describe; facts measure.

Using natural keys as foreign keys in fact tables. If you join fct_order_lines.customer_id directly to dim_customers.customer_id (natural key), your model breaks when source customer IDs change or when SCD Type 2 creates multiple rows for the same natural key. Always join on surrogate keys.

Collapsing multiple grains into one fact table. A fact table should have exactly one grain — one row means exactly one thing. Mixing order-line-level and order-header-level measures in the same fact table produces double-counting errors that are notoriously difficult to diagnose.

Failing to populate the date dimension fully. The date dimension should contain every date in your analytical range, including future dates if you have forward-looking forecasts. Gaps in dim_date produce missing rows in time-series queries.

Treating a slowly changing dimension as append-only. If you add new rows to a customer dimension when their attributes change (SCD Type 2), but forget to set the row_expiry_date and is_current_row flags on the old row, your dimension join will produce duplicate rows for every customer who changed — inflating all aggregates.

Key Takeaways

  • Fact tables contain foreign keys and numeric measures. Dimension tables contain surrogate keys and descriptive attributes. Every analytical query in a dimensional model follows the pattern: join facts to dimensions, filter on dimension attributes, aggregate the measures.
  • Choose your fact table type by the analytical question. Transaction facts for event-level analysis. Periodic snapshots for trend and state tracking. Accumulating snapshots for pipeline and process-efficiency analysis.
  • Use surrogate keys in every dimension table. They decouple your warehouse from source system IDs, enable SCD Type 2 history, and improve join performance.
  • Conformed dimensions are what turn isolated fact tables into an integrated warehouse. The same dim_date and dim_customers shared across all subject areas make cross-process analysis possible without data reconciliation.
  • In dbt, follow the fct_ / dim_ naming convention, materialize large fact tables as incremental models, and enforce primary key and foreign key tests in schema.yml on every mart model.

FAQ

What is the main difference between a fact table and a dimension table?

A fact table stores measurable, quantitative business events — each row is a transaction or occurrence, and the numeric columns are the measures you aggregate (revenue, quantity, duration). A dimension table stores descriptive context about those events — attributes like customer name, product category, or sales region that let you slice and filter the measures. Fact tables are tall and narrow; dimension tables are shorter and wider.

What are the three types of fact tables in the Kimball methodology?

The three main fact table types in Kimball dimensional modeling are: (1) Transaction fact tables — one row per discrete event, highest grain, never updated; (2) Periodic snapshot fact tables — one row per period per entity, capturing state at a fixed point in time; and (3) Accumulating snapshot fact tables — one row per business process lifecycle, updated as the process advances through stages. A fourth type — factless fact tables — records events with no numeric measure, used for coverage and attendance tracking.

What is a surrogate key and why use one instead of a natural key?

A surrogate key is a system-generated integer or UUID assigned to each row in a dimension table, independent of any source system value. A natural key is the identifier that exists in the source system (e.g., customer_id from the CRM). Surrogate keys are used because they remain stable across source system changes and migrations, they support slowly changing dimension (SCD) history by allowing multiple rows for the same natural key entity, and they join faster than long string or composite natural keys.

What is a degenerate dimension?

A degenerate dimension is a dimension that lives in the fact table itself rather than in a separate dimension table. It is typically an identifier from a source transaction — like an order number, invoice number, or ticket ID — that has descriptive value but no additional attributes worth modeling into a full dimension table. Including it in the fact table allows analysts to group or filter by it without a join.

What is a conformed dimension?

A conformed dimension is a dimension table that is shared across multiple fact tables in a data warehouse. The classic example is a date dimension: the same dim_date table is joined to sales facts, inventory facts, and support ticket facts. Conformed dimensions enable cross-process analysis — you can compare sales revenue and support volume by the same calendar week because both facts join to the same date dimension with consistent keys and attributes.

How does dbt implement fact and dimension tables?

In dbt, fact and dimension tables are implemented as models — SQL SELECT statements in .sql files under the models/ directory. By convention, fact models are prefixed fct_ and dimension models are prefixed dim_. dbt handles materialization (table, view, or incremental), dependency resolution via ref() functions, and data quality testing via schema.yml. Incremental models are commonly used for large fact tables to append only new rows rather than rebuilding the full table on each run.

What is the difference between a star schema and a snowflake schema?

In a star schema, each dimension is a single denormalized table joined directly to the fact table — the structure looks like a star with the fact table at the center. In a snowflake schema, dimensions are normalized into multiple related tables (e.g., dim_product splits into dim_product, dim_category, and dim_brand). Star schemas are faster to query and easier for BI tools to navigate. Snowflake schemas reduce storage and enforce referential integrity but require more joins and are harder for analysts to work with directly.