All engineering is becoming data engineering

2026-06-06

In my last post, I referenced Michael Bloch’s article on the rapid transformation of what engineers will do in discussing how engineers need to move up the value chain. The engineers that will thrive are those that understand how the business works and translate that into an AI context that agents can build upon. I particularly liked Michael Bloch's assertion that “data is the real interface”. That’s exactly right; the primary challenge in modern product engineering is getting the data exchanged between systems consistent and, more importantly, useful. AI coding will end up making everything except data engineering less and less relevant. In essence, all engineering is increasingly becoming data engineering.

Bloch writes:

"The right interface between two components is a well-structured data artifact, not a function call. Clean data lets agents compose systems without being told how."

Since integration of various software components can be easily rebuilt using AI agents, what becomes of paramount importance is a shared understanding of the context of each component, what it does, and most importantly, the kind of information it needs to do it and what the information it returns actually means. This is what we’ve always tried to do in building data APIs and data pipelines.

AI enables less technical people to rapidly iterate application functionality on top of this data. Moving this capability closer to the business is a good thing, because the people closest to the business are the people with the most context.

We see early attempts at this with vibe coding. Most of the results I’ve seen are good prototypes or one-person tools, but they tend to break down when shared. Not because of bad tech, but because they are implicitly shaped around one person’s content. What is missing is the shared data interface.

There are a few reasons for this. I’m not going to talk right now about how data from different systems is scattered in different places in different formats and consumed in different ways. That’s the standard data engineering problem. It’s a difficult thing to manage (that’s why we exist!) but I believe the problem itself is well understood. I prefer to talk about the implicit data challenges opened by AI.

Data buried inside workflows

Think about a typical analyst at a large company. They know which reports to pull. They know how to synthesize them. They've developed an intuition about what the numbers mean in context. What they're doing, in effect, is creating a representation of data that matches their function — but that representation lives inside the process itself, not exposed anywhere a system can reach.

Before AI, extracting that buried representation was impractical. Rebuilding every workflow into a clean schema was expensive, brittle, and slow. So most companies didn't attempt it. The data stayed trapped. Agentic AI changes this. Not because it magically frees the data — but because it makes extraction tractable at a scale that wasn't possible before. The backlog of "data locked in process" is now addressable. But only if you know how to approach it.

The agent context problem

There's a second, subtler version of this problem.

Agents, left to their own devices, take on the "faces" of their users. Ask two people the same question in different contexts and you'll get two different answers — not because the facts differ, but because the implied context does. One person thinks in terms of finance. The other on operations. Same question, different frames of reference. All implicitly absorbed by AI.

Standardizing that implied context mismatch — making data mean the same thing across systems, teams, and agents — is at the heart of what data engineers actually do. It's not primarily about writing code. It's about understanding data deeply enough to expose it in a form that composes cleanly across systems.

That work has never been glamorous. It rarely makes it into conference talks. But it is, increasingly, the foundation everything else is built on.

What this means for us

At Lineate, we've been doing data engineering for over a decade. For most of that time, the pitch was straightforward: here's your pipeline, here's your warehouse, here's your reporting layer. The pitch is changing. Now it's: here is the data foundation your AI systems require to function. They need context to be explicit, not implied.

The question I wrestle with is how to explain this when the industry is moving so fast that most people haven't yet figured out how to verbalize the problem. They know something is broken. They've seen the agent demos fail in production. They've watched AI pilots stall before they ship. They just don't always know the name of what went wrong. That's what we do. We make the data ready for AI.

If you have a different take on where engineering is going, I'd be glad to compare notes.

Blog by Ben Engber, CEO of Lineate

Got a project?

Harness the power of your data with the help of our tailored data-centric expertise.