What I Actually Look For When Hiring Data Engineers

Job descriptions for data engineers are almost universally useless. They list technologies Spark, Airflow, dbt, Kafka as if the ability to name a tool predicts the ability to build reliable systems with it.

After a fair number of hiring cycles in Malaysian financial services, here's what I'm actually looking for.

Debugging Instinct Over Tool Knowledge

The single most predictive quality I've found is how someone responds when something breaks. Not whether they know the answer but what they do.

Do they read the error message properly? Do they form a hypothesis and test it, or do they google the error string and paste the first Stack Overflow answer? Do they know how to narrow the problem space?

In interviews, I give candidates a broken pipeline a simple Python script with a subtle data issue and watch how they work. The technology doesn't matter. The process does.

A strong candidate will say: "Let me check what the data looks like at this step before the transformation." A weaker one will immediately try to fix what they think is wrong without confirming the diagnosis.

Understanding Data, Not Just Pipelines

Data engineering is often taught as plumbing move data from A to B, apply transformations, load to destination. The plumbing matters. But the best data engineers I've worked with have genuine curiosity about the data itself.

They ask: Why does this column have 40% nulls? Is that expected or a signal? They notice that the claims count for March 2025 is unusually low and wonder whether it's a submission lag or a pipeline gap. They treat anomalies as questions, not noise.

This is harder to teach than SQL window functions. I look for it in how candidates describe previous projects do they talk only about the technical implementation, or do they mention what they learned about the domain?

Knowing What "Done" Means

Junior engineers tend to consider a pipeline done when it runs successfully once. Senior engineers know that "done" means:

It runs correctly on bad data, not just the happy path
It fails loudly and informatively when something goes wrong
Someone else can understand what it does and why
It can be reprocessed safely if upstream data changes

I ask candidates: Tell me about a pipeline you built. What would you change if you built it again? The answer reveals a lot. If they say "nothing," I worry. If they talk about observability, data quality checks, or better error handling, I'm interested.

Communication Across the Stack

In an insurance or financial services environment, data engineers sit between the source systems team (IT infrastructure, sometimes a vendor) and the analytics consumers (actuaries, finance, management reporting). You spend a lot of time translating.

Can this person explain a data quality issue to a finance analyst without condescension? Can they push back on a poorly specified requirement without killing the relationship? Can they write a clear incident report when a pipeline fails overnight?

I read cover letters very carefully. Not for eloquence for clarity. Someone who writes clearly tends to think clearly.

What I Don't Weight Heavily

Specific tool experience. Tools change. Fundamentals don't. I care more that someone can write clean, testable Python than whether they've used exactly our stack.

Certifications. A DP-203 or a Databricks badge tells me someone passed an exam. It doesn't tell me how they work under ambiguity.

GPA, for mid-career candidates. Irrelevant after the first two years.

A Note on Malaysian Financial Services Specifically

The regulatory context here BNM, RMiT, IFSA, the audit obligations that come with licensed entities adds a dimension that most data engineering curricula don't cover. I don't expect new graduates to know this. I do expect them to take it seriously when I explain it.

What I'm looking for in that case is intellectual humility: the ability to say "I don't know this, but I'll learn it" without defensiveness.

That's actually the quality that matters most at every level.