TL;DR
ETL (Extract, Transform, Load) is the data-pipeline pattern where data is extracted from source systems, transformed in a staging environment, and then loaded into the target warehouse. ETL was the dominant pattern from the 1990s through the early 2010s, when storage and compute were expensive and transformations needed to happen before data hit the warehouse. Modern data stacks have largely shifted to <a href="/glossary/elt" class="text-brand-600 underline decoration-brand-200 underline-offset-2 hover:text-brand-700">ELT</a> (Extract, Load, Transform) — but ETL remains relevant for specific use cases.
What is ETL?
ETL is the original analytical-data pipeline pattern: extract data from source systems (CRMs, ERPs, transactional databases), transform it in an intermediate staging area (clean, conform, aggregate, join), then load the transformed result into the target warehouse.
It was the dominant approach from the 1990s through the early 2010s because storage and compute in target warehouses were expensive — it was cheaper to do heavy transformation work outside the warehouse and only load the cleaned result.
ETL vs ELT
| Property | ETL | ELT |
|---|---|---|
| Order | Extract → Transform → Load | Extract → Load → Transform |
| Where transform happens | Staging environment (separate) | Inside the target warehouse |
| When dominant | 1990s–early 2010s | Mid-2010s–present |
| Source data preserved | Often discarded after transform | Typically kept in raw form |
| Reprocessing | Requires re-extracting from source | Re-runs transforms over preserved raw data |
| Cost driver | Staging compute + warehouse storage | Warehouse compute + storage |
When ETL still makes sense
For most modern analytical workloads, ELT has displaced ETL. The trigger was the cost-collapse of warehouse storage and compute (Snowflake, BigQuery, Redshift) and the rise of warehouse-native transformation tools (dbt) that made warehouse-side transformation easier than staging-environment transformation.
- Strict data-residency requirements: sensitive data needs heavy transformation (PII removal, anonymisation) before crossing region boundaries
- Legacy environments: on-premise warehouses where pre-warehouse transformation is genuinely cheaper than warehouse-side transformation
- Highly structured pipelines with stable transforms: where the upfront ETL design pays back through pipeline efficiency
- Real-time stream processing: when data must be transformed at ingest time before any consumer sees it
Common pitfalls
- 1. Defaulting to ETL by reflex. Engineers trained pre-2015 often default to ETL when ELT would be cleaner. Question the default; modern warehouses make ELT the better choice for most analytical workloads.
- 2. Discarding raw data after transform. Reprocessing requirements (new requirements, bug fixes, schema changes) require re-extracting from source — slow, error-prone, and sometimes impossible if source systems have changed. Preserve raw data when possible.
- 3. Mixing ETL and ELT inconsistently. Some pipelines transform in staging, others in warehouse — produces unpredictable maintenance patterns. Pick a default and apply consistently.
Related concepts
ELT is the modern alternative pattern. Reverse ETL pushes warehouse-modeled data back to operational systems. CDC is the change-aware extraction pattern. Data lakes and lakehouses are the storage substrates these pipelines feed.
At a glance
- Category
- Business Intelligence
- Related
- 5 terms
Frequently asked questions
ETL or ELT?
Default to ELT for new builds. Modern warehouses make warehouse-side transformation cheaper, faster, and easier to maintain than staging-environment transformation. ETL still makes sense for strict data-residency requirements, legacy on-premise environments, and real-time stream processing — but isn't the right default for cloud analytical workloads.
Is ETL obsolete?
No — but it's no longer the default. ETL remains the right choice for specific use cases (data residency, real-time streams, legacy environments). For typical cloud analytical workloads, ELT has displaced it.
What tools are used for ETL?
Legacy: Informatica, Talend, IBM DataStage, Microsoft SSIS. Modern stream-processing ETL: Apache Flink, Kafka Streams, Spark Structured Streaming. Pure batch ETL is increasingly rare in cloud-native environments where ELT tools (Fivetran/Airbyte for extract+load, dbt for transform) have largely taken over.
Sources
- Modern Data Stack reports (2024–25)
- Fivetran ELT-vs-ETL whitepapers
- Apache Airflow / Spark documentation
Fairview is an operating intelligence platform that consumes data from both ETL and ELT pipelines — surfacing operating views regardless of whether transformations happen pre- or post-load. Start your free trial →
Siddharth Gangal is the founder of Fairview. He built the pipeline-pattern-agnostic ingestion layer after watching companies maintain rigid ETL practices for decade-old reasons (warehouse storage was expensive in 2008) — when the cost-economics had reversed years ago and the discipline was now creating maintenance overhead without producing the original benefit.
See it in Fairview
Track ETL (Extract, Transform, Load) automatically.
14-day free trial. No credit card. First data source connected in 5 minutes.