The Medallion Architecture Is Not Magic

Bronze, silver, gold the medallion pattern is everywhere. Here's what it actually solves, and where people get it wrong.

If you've spent any time in modern data engineering circles, you've heard the word medallion. Bronze layer, silver layer, gold layer. Diagrams with gradient rectangles and arrows. It looks clean. It sounds clean.

But I've seen it implemented badly enough times that I want to write about what it's actually for.

What the Pattern Solves

The core problem is data quality propagation. In a naive pipeline, you ingest raw data and transform it in one step. When something breaks a schema change, a bad batch, an upstream outage you have no clean recovery point.

The medallion pattern separates concerns:

Layer What it contains
Bronze Raw ingested data, immutable, append-only
Silver Cleaned, validated, deduplicated records
Gold Business-ready aggregates and dimensional models

The key insight is immutability at bronze. You never touch that layer after write. It's your source of truth for reprocessing.

Where People Get It Wrong

Treating gold as a dumping ground. I've seen gold layers with 400-column flat tables. That's not a semantic layer that's just a messy silver layer with a coat of paint.

Skipping SCD handling at silver. If you're not managing slowly changing dimensions at silver, you'll have correctness issues at gold that are almost impossible to trace back.

Over-engineering bronze. Bronze should be cheap and fast. Parquet, partitioned by ingestion date. That's it. Stop trying to normalize it.

The Azure Stack Reality

On Azure, you're typically looking at: - ADLS Gen2 for storage across all layers - Azure Data Factory or Fabric Data Pipelines for orchestration - Delta Lake format for ACID compliance and time travel

Delta's time travel capability is underrated here it effectively gives you a second recovery mechanism on top of bronze immutability.

Summary

The medallion pattern is a discipline, not a technology. The value is in the contracts between layers: what bronze promises to silver, what silver promises to gold. If those contracts aren't enforced through schema validation, data quality checks, and proper SCD logic the pretty diagram means nothing.