If you've spent any time in modern data engineering circles, you've heard the word medallion. Bronze layer, silver layer, gold layer. Diagrams with gradient rectangles and arrows. It looks clean. It sounds clean.
But I've seen it implemented badly enough times that I want to write about what it's actually for.
What the Pattern Solves
The core problem is data quality propagation. In a naive pipeline, you ingest raw data and transform it in one step. When something breaks a schema change, a bad batch, an upstream outage you have no clean recovery point.
The medallion pattern separates concerns:
| Layer | What it contains |
|---|---|
| Bronze | Raw ingested data, immutable, append-only |
| Silver | Cleaned, validated, deduplicated records |
| Gold | Business-ready aggregates and dimensional models |
The key insight is immutability at bronze. You never touch that layer after write. It's your source of truth for reprocessing.
Where People Get It Wrong
Treating gold as a dumping ground. I've seen gold layers with 400-column flat tables. That's not a semantic layer that's just a messy silver layer with a coat of paint.
Skipping SCD handling at silver. If you're not managing slowly changing dimensions at silver, you'll have correctness issues at gold that are almost impossible to trace back.
Over-engineering bronze. Bronze should be cheap and fast. Parquet, partitioned by ingestion date. That's it. Stop trying to normalize it.
The Azure Stack Reality
On Azure, you're typically looking at: - ADLS Gen2 for storage across all layers - Azure Data Factory or Fabric Data Pipelines for orchestration - Delta Lake format for ACID compliance and time travel
Delta's time travel capability is underrated here it effectively gives you a second recovery mechanism on top of bronze immutability.
Summary
The medallion pattern is a discipline, not a technology. The value is in the contracts between layers: what bronze promises to silver, what silver promises to gold. If those contracts aren't enforced through schema validation, data quality checks, and proper SCD logic the pretty diagram means nothing.