ETL (Extract, Transform, Load)

TL;DR

ETL (Extract, Transform, Load) is the data-pipeline pattern where data is extracted from source systems, transformed in a staging environment, and then loaded into the target warehouse. ETL was the dominant pattern from the 1990s through the early 2010s, when storage and compute were expensive and transformations needed to happen before data hit the warehouse. Modern data stacks have largely shifted to <a href="/glossary/elt" class="text-brand-600 underline decoration-brand-200 underline-offset-2 hover:text-brand-700">ELT</a> (Extract, Load, Transform) — but ETL remains relevant for specific use cases.

What is ETL?

ETL is the original analytical-data pipeline pattern: extract data from source systems (CRMs, ERPs, transactional databases), transform it in an intermediate staging area (clean, conform, aggregate, join), then load the transformed result into the target warehouse.

It was the dominant approach from the 1990s through the early 2010s because storage and compute in target warehouses were expensive — it was cheaper to do heavy transformation work outside the warehouse and only load the cleaned result.

ETL vs ELT

Property	ETL	ELT
Order	Extract → Transform → Load	Extract → Load → Transform
Where transform happens	Staging environment (separate)	Inside the target warehouse
When dominant	1990s–early 2010s	Mid-2010s–present
Source data preserved	Often discarded after transform	Typically kept in raw form
Reprocessing	Requires re-extracting from source	Re-runs transforms over preserved raw data
Cost driver	Staging compute + warehouse storage	Warehouse compute + storage

When ETL still makes sense

For most modern analytical workloads, ELT has displaced ETL. The trigger was the cost-collapse of warehouse storage and compute (Snowflake, BigQuery, Redshift) and the rise of warehouse-native transformation tools (dbt) that made warehouse-side transformation easier than staging-environment transformation.

Strict data-residency requirements: sensitive data needs heavy transformation (PII removal, anonymisation) before crossing region boundaries
Legacy environments: on-premise warehouses where pre-warehouse transformation is genuinely cheaper than warehouse-side transformation
Highly structured pipelines with stable transforms: where the upfront ETL design pays back through pipeline efficiency
Real-time stream processing: when data must be transformed at ingest time before any consumer sees it

Common pitfalls

1. Defaulting to ETL by reflex. Engineers trained pre-2015 often default to ETL when ELT would be cleaner. Question the default; modern warehouses make ELT the better choice for most analytical workloads.
2. Discarding raw data after transform. Reprocessing requirements (new requirements, bug fixes, schema changes) require re-extracting from source — slow, error-prone, and sometimes impossible if source systems have changed. Preserve raw data when possible.
3. Mixing ETL and ELT inconsistently. Some pipelines transform in staging, others in warehouse — produces unpredictable maintenance patterns. Pick a default and apply consistently.

ELT is the modern alternative pattern. Reverse ETL pushes warehouse-modeled data back to operational systems. CDC is the change-aware extraction pattern. Data lakes and lakehouses are the storage substrates these pipelines feed.

At a glance

Category: Business Intelligence
Related: 5 terms

Frequently asked questions

ETL or ELT?

Default to ELT for new builds. Modern warehouses make warehouse-side transformation cheaper, faster, and easier to maintain than staging-environment transformation. ETL still makes sense for strict data-residency requirements, legacy on-premise environments, and real-time stream processing — but isn't the right default for cloud analytical workloads.

Is ETL obsolete?

No — but it's no longer the default. ETL remains the right choice for specific use cases (data residency, real-time streams, legacy environments). For typical cloud analytical workloads, ELT has displaced it.

What tools are used for ETL?

Legacy: Informatica, Talend, IBM DataStage, Microsoft SSIS. Modern stream-processing ETL: Apache Flink, Kafka Streams, Spark Structured Streaming. Pure batch ETL is increasingly rare in cloud-native environments where ELT tools (Fivetran/Airbyte for extract+load, dbt for transform) have largely taken over.

Sources

Modern Data Stack reports (2024–25)
Fivetran ELT-vs-ETL whitepapers
Apache Airflow / Spark documentation

Fairview is an operating intelligence platform that consumes data from both ETL and ELT pipelines — surfacing operating views regardless of whether transformations happen pre- or post-load. Start your free trial →

Siddharth Gangal is the founder of Fairview. He built the pipeline-pattern-agnostic ingestion layer after watching companies maintain rigid ETL practices for decade-old reasons (warehouse storage was expensive in 2008) — when the cost-economics had reversed years ago and the discipline was now creating maintenance overhead without producing the original benefit.

See it in Fairview

Track ETL (Extract, Transform, Load) automatically.

14-day free trial. No credit card. First data source connected in 5 minutes.

Start free trial Book a demo