Designing a Claims Pipeline on Azure Medallion Architecture

Medical and takaful claims data is messy, late-arriving, and regulated. Here's how the bronze-silver-gold pattern maps to the reality of an insurance data platform.

Claims data is some of the messiest data you'll work with in financial services. It arrives late. It gets amended after the fact. A single claim can have five versions submitted, assessed, queried, resubmitted, settled sometimes spanning months. And in a regulated environment, every version matters.

The medallion architecture handles this well, but only if you design the layers with claims-specific patterns in mind. Here's how I think about it.

Bronze: Ingest Everything, Change Nothing

The first rule of the bronze layer in a claims context is absolute immutability. Every file, every API payload, every HL7 batch land it as-is, partitioned by ingestion date.

adls://container/bronze/
  medical-claims/
    ingestion_date=2026-03-01/
      batch_20260301_001.parquet
    ingestion_date=2026-03-02/
      batch_20260302_001.parquet

This gives you two things: a complete audit trail (critical under IFSA 2013 and BNM RMiT), and a reprocessing capability if silver logic changes.

Don't schema-enforce at bronze. Claims providers send inconsistent formats. A rigid schema at ingest means pipeline failures instead of data. Handle variance at the silver boundary.

Silver: Where the Real Work Happens

Silver is where claims data becomes trustworthy. The key operations:

Deduplication and version tracking. Claims get amended. You need SCD Type 2 here one row per claim version, with valid_from / valid_to markers. Your silver layer should answer: what was the state of claim X on date Y?

Schema normalisation. Flatten nested structures. Standardise code lists (ICD-10 diagnosis codes, CPT procedure codes, panel vs non-panel provider flags). Map provider IDs to your internal provider master.

Data quality gates. Define your quality rules explicitly: - Claim amount must be positive and non-null - Diagnosis code must exist in the active ICD-10 registry - Date of service must precede date of submission - Member ID must resolve in the member master

Failed records go to a quarantine partition, not the bin. You want visibility into rejection rates by provider and claim type that pattern often signals upstream data quality issues worth escalating.

Late-arriving records. Claims submitted 60-90 days after service are common. Your silver pipeline must handle reprocessing without duplicating records. Delta Lake's MERGE INTO is the right tool here.

Gold: Business Questions, Not Raw Data

The gold layer should be designed around questions, not tables. Common gold assets for a claims platform:

Gold Asset Business Question
claims_summary What is our claims exposure by month, plan, and diagnosis group?
provider_utilisation Which providers have the highest frequency and average cost per claim?
member_claims_history What is the full claims history for a given member?
loss_ratio What is the loss ratio by product and benefit type?

These are dimensional models or aggregated fact tables not the silver layer with renamed columns.

The Regulatory Dimension

Under BNM's RMiT framework, your data platform carries specific obligations:

Data lineage: you need to trace any gold figure back to its bronze source. Azure Purview (Microsoft Purview) handles this if you instrument your pipelines correctly tag datasets at bronze with source system and ingestion timestamp, propagate tags through transformations.

Retention: raw claims data often has a 7-year retention requirement. Your bronze layer should be on a lifecycle-managed storage tier (Cool → Archive after 90 days).

Access controls: in a group structure with multiple entities (family takaful, general takaful, reinsurance), bronze and silver should be ring-fenced per entity. A single ADLS account with separate containers per subsidiary, governed by Azure Entra ID groups, is the practical implementation.

What Goes Wrong

The most common failure I've seen in claims pipelines is conflating silver and gold. Teams build complex business logic loss ratio calculations, IBNR estimates, plan-specific adjustment factors directly into the silver layer. When business rules change (and they always do), you end up reprocessing silver from scratch.

Keep silver a faithful, clean representation of reality. Keep gold opinionated about business definitions. The boundary between them is where your data contracts live.