To Trust AI You Need to Trust Your Data

Claude Zwicker, Senior Product Manager
Published May 21, 2026
Default alt text

There’s a version of AI adoption that looks great in a demo and breaks badly in production. The agent answers questions quickly, synthesizes data across systems, and surfaces insights your analysts would have spent days chasing. But if you haven’t thought carefully about what data that agent can actually touch, you’ve traded one problem for a bigger one.

The question most enterprises are asking right now is: “how do we make AI trustworthy?” And the instinct is to focus on the model — its guardrails, its training data, its outputs. That’s not wrong. But it’s incomplete. Because underneath every AI-generated answer is a data access decision. And most organizations aren’t ready for what that means.

The access problem no one is talking about

When AI agents start querying enterprise systems directly, a familiar problem gets worse fast: overprovision the agent, and you’ve effectively handed every user a master key.

Think about it this way. As a product manager, you probably have access to product usage metrics. You almost certainly don’t have access to compensation data. That boundary is enforced by policy, explicitly and deliberately. Now introduce an agent that has been given broad access across your enterprise systems to be “help.” Suddenly, the question “What do my colleagues earn?” has a plausible path to an answer. Not because policy changed. Because the agent bypassed it.

This is the core failure mode: organizations provision agents the same way they’d provision a superuser, without recognizing that agents operate on behalf of users who don’t have superuser rights. The boundary that existed for people quietly disappears for AI.

Securing the model isn’t enough

There’s a natural instinct to solve this at the model layer: build better guardrails, tune the system prompt, restrict what the agent is allowed to ask. And yes, that matters. But it’s not sufficient on its own.

Guardrails at the model level can be worked around. Once users understand what’s restricted, they learn to ask indirectly. Chain enough benign-sounding prompts together, and a reasoning model can arrive at an answer it was never supposed to give. The guardrail holds the door; the data layer is the lock.

You need both. Model-level controls govern what questions can be asked. Data-level controls govern what information can be used to answer them. Even if a user finds a way through the model layer, if the underlying data is governed correctly, the response will only reflect what that user is actually authorized to see. The safety net exists at the source.

What “authorized access” actually means for an agent

Here’s where things get genuinely tricky. “The agent accessed data it was authorized to use” sounds simple. In practice, it raises a question that’s harder to answer than it looks: authorized by whom, for what purpose, on whose behalf?

Agents are their own type of identity — separate from human users. When an agent queries a dataset, two things need to be true simultaneously: the agent has permission to access that data, and the human whose task the agent is performing also has permission to see the result. That second conditio n is the one most organizations forget.

Context and intent have to travel with the agent all the way to the data layer. And because agents move fast, completing tasks in seconds rather than days, access should be temporary by design. Grant it for the duration of the task, then revoke it. If an agent is compromised mid-task, the blast radius stays small. The data it could reach disappears the moment the task ends.

Policy as the operating system for AI access

For human users, a data access request might take days: submit a ticket, wait for approval, get access, use it. That workflow, frustrating as it is for analysts having to potentially wait for a few days until an access decision is made, is somewhate acceptable for the human-driven world.

However, nobody wants to wait three days for an agent to finish a task. The approval window has to collapse to milliseconds. That’s not a slight improvement to the existing process; it’s a fundamentally different architecture.

Policy-driven access provisioning is how organizations make that work. Instead of routing every agent request through a human approval chain, you define the rules in advance: if an agent is performing task type X, on behalf of a user with role Y, under purpose Z, here’s what it can access. The decision happens automatically, at machine speed, against a deterministic set of rules. The human governance happens up front, in policy authoring, not as a bottleneck in every individual transaction.

The audit question: who did what, and why?

When something goes wrong — and eventually, something will — the question isn’t just what the agent did. It’s who set it in motion, what task it was performing, and why it had access to the data it touched.

An audit trail that says “agent ran a query” is nearly useless. A complete audit trail says: which agent, what task, when, on behalf of which user, under what context and intent. Without that chain, you can’t trace a bad outcome back to its source. You can’t identify whether a human actor was attempting to exploit the agent to reach data they shouldn’t have. You can’t demonstrate compliance.

The dual-identity requirement, logging both the agent identity and the delegating human identity, is what closes that gap. It’s not a nice-to-have. As regulatory scrutiny around AI systems increases, it’s likely to become table stakes.

What becomes possible when you get this right

AI agents are genuinely different from human data consumers. They don’t take sick days. They don’t work at human speed. They can perform, in seconds, analysis that would take a skilled analyst days. That potential productivity is real, and it’s why everyone is so excited about where this is going.

But the calculus only works if risk stays manageable. A 50% productivity increase sounds compelling until you pair it with a material increase in the likelihood of a significant compliance failure. At that point, most organizations will rationally choose not to deploy.

The organizations that will actually capture the value of AI aren’t the ones that move fastest. They’re the ones that move fast with a credible answer to the trust question. That means having the governance infrastructure in place before agents proliferate, not scrambling to retrofit it after an incident.

We’re likely approaching a point where there will be more agent identities querying enterprise data than human ones. The organizations building the right access infrastructure today are the ones who will be able to scale AI confidently tomorrow. The ones who skip this step will find themselves choosing between slowing down and accepting risk. Neither is a good option.

Trust in AI doesn’t come from the model alone. It comes from knowing that the data behind every answer was the right data, accessed by the right identity, for the right purpose, at the right time. That’s not a technical detail. It’s the foundation.

Frequently Asked Questions

Why does data access governance matter for AI trustworthiness?

AI agents generate answers by querying data — and those answers are only as trustworthy as the data they’re allowed to access. If an agent can reach data that the requesting user isn’t authorized to see, the output carries information that should never have surfaced. Governing access at the data layer is what ensures the right data reaches the right consumer, regardless of how the request was made.

What is the difference between an AI agent acting as a user versus acting on behalf of a user?

An agent is its own identity — separate from the human who invoked it. When an agent acts “on behalf of” a user, it carries that user’s authorization context with it through every system it touches. This means the agent’s data access is bounded by what the originating user is permitted to see, not by whatever broad permissions the agent itself has been granted. The distinction matters enormously for compliance: without it, agents can inadvertently expose data to users who were never authorized to access it.

What should an AI audit trail include?

A complete audit trail for AI agent activity should answer five questions: which agent acted, what task it performed, when it ran, on behalf of which user, and under what stated purpose. Logging only the agent’s query — without the human context behind it — leaves a gap that makes it impossible to trace accountability, detect misuse, or demonstrate compliance to regulators.

What is ephemeral or just-in-time data access, and why does it matter for AI agents?

Ephemeral access means granting an agent permission to reach specific data only for the duration of a given task — then automatically revoking it when the task ends. Because agents operate at machine speed, there’s no practical reason for persistent, standing access. Limiting access to the task window reduces the potential impact if an agent is compromised and keeps the overall data exposure footprint as small as possible.

How does policy-driven provisioning support AI at enterprise scale?

Human data access workflows — submitting tickets, waiting for approvals — can take days. Agents complete tasks in seconds. Policy-driven provisioning bridges that gap by encoding access rules in advance: if an agent is performing a defined task type, on behalf of a user with a given role, it is automatically granted the appropriate access in real time. Governance happens at policy authoring, not as a bottleneck in every individual transaction. This is how organizations scale AI without scaling risk.

See how organizations are approaching AI governance today.

Immuta surveyed data leaders on the state of governance in the AI era. See what they said.

your data

Put all your data to work. Safely.

Innovate faster in every area of your business with workflow-driven solutions for data access governance and data marketplaces.