Zero Trust for AI Agents: How to Enforce Anthropic's Framework

Anthropic's whitepaper opens with a statement that frames the past and present of AI and data security: "Perimeter-based cybersecurity defenses can't keep up with modern threats, and the threats themselves are accelerating."

The first half has been true for years. Social engineering has replaced malware as the go-to attack method. Stolen credentials are a factor in 86% of breaches, bypassing perimeter-based cybersecurity defenses entirely.

The second half is becoming true now. AI is accelerating threats — giving attackers more tools to scale social engineering and exposing the full extent of the blast radius, the total volume of data a single compromised identity can reach. Agents bypass the application controls that once stood between identities and data, connecting directly to databases, APIs, and data stores and accessing data at machine speed.

Anthropic's answer is to apply Zero Trust to agents.

The six pillars of Anthropic’s Zero Trust framework

The Zero Trust philosophy — trust nothing, verify everything, assume breach has already occurred — has been a security philosophy since the early 1990s. It's a proven foundation. Anthropic argues that the principle needs a new shape for agentic systems: "identities that are cryptographically rooted, permissions scoped per task, memory protected against poisoning, and defensive operations that run at the speed of autonomous attackers."

The whitepaper lays out a practical framework organized around six core pillars:

Agent identity and authentication: Move from human/user identity to cryptographically-rooted agent identity. Every agent must carry verifiable proof of what it is, who deployed it, and what it's authorized to do.
Access control and privilege management: Replace role-based access with permissions scoped per individual task. An agent authorized to read a database for one query should not retain that access for the next.
Observability and auditing: Comprehensive logging and monitoring of agent behavior, tool calls, and data access.
Behavioral monitoring and response: Continuous analysis of agent actions to detect anomalous, malicious, or noncompliant patterns – at machine speed, not human speed.
Input validation and output controls: Defenses against prompt injection, tool poisoning, and data leakage at every agent boundary.
Integrity and recovery: Protecting agent memory against poisoning and ensuring systems can recover from compromise.

Anthropic also identifies the specific threats that make agents different from traditional IT: prompt injection, tool poisoning, identity and privilege abuse, memory poisoning, and supply chain attacks.

These aren't theoretical. Frontier AI models can already chain multiple weaknesses and produce working exploits in hours, compressing a timeline that used to take months.

Get started with a free AI Data Risk Assessment.

Get your assessment

Attack flows like "Reprompt" are already being used to turn AI systems against the organizations that deploy them. Varonis AI attack specialist Abdiel Santos recently ran an AI attack lab demonstrating how chatbot and agent behavior can be redirected to perform unauthorized actions.

Anthropic’s framework maps these six core pillars into three maturity tiers — Foundation, Advanced, and Optimized — and outlines an eight-phase implementation workflow covering identity, access scoping, sandboxing, input/output controls, and memory safeguards. It also introduces the concept of Agentic SOAR: security orchestration, automation, and response running fast enough to contend with AI-accelerated attackers.

It's a well-organized and useful starting point for any organization deploying agents. We encourage you to read it.

The framework is sound. Enforcement is what matters.

Anthropic's Zero Trust for AI Agents framework maps the what. The next question for every organization should be, “How?” How do you actually enforce Zero Trust for AI Agents across a sprawling, heterogeneous AI environment?

We share Anthropic's conviction that AI security requires a fundamentally different approach. As David Gibson, our SVP of strategic programs, has written: AI doesn't create new data risks — it amplifies existing ones. Excessive permissions that sat dormant for years become critical when an agent inherits them. Sensitive data that was theoretically accessible becomes practically exposed when an AI agent can find it, reason over it, and act on it in seconds.

The security industry's initial response has been to bolt AI-specific controls onto existing stacks: prompt filters, model scanners, and standalone inventories. These address the AI layer. They miss the data layer. And the data layer is where the damage happens.

This is where Varonis Atlas comes in.

How Varonis Atlas enforces Zero Trust for AI agents

Varonis Atlas is a complete AI Security Platform. With Atlas, organizations have the capabilities they need to enforce Zero Trust for AI agents across the entire security lifecycle.

Here's how Atlas maps to the framework Anthropic outlines — and where it goes further.

Discover: AI inventory and shadow AI

You can’t enforce least privilege on agents you don’t know exist. Atlas continuously discovers AI systems across cloud, SaaS, code repositories, and AI platforms, including shadow AI, to build a complete, living inventory of agents, models, and their data access.

Discovery is foundational. Posture can’t assess what isn’t known. Monitoring can’t watch what isn’t visible. Governance can’t control what isn’t documented.

Assess: AI Security Posture Management (AI-SPM)

Anthropic calls for continuous assessment of agent configurations, permissions, and dependencies. Atlas AI-SPM does this across agents, chatbots, and models — identifying vulnerabilities, misconfigurations, and risky data exposure.

The difference is data context. Knowing an agent can access SharePoint is one thing. Knowing it can access millions of sensitive records is another. That context turns posture into a real risk assessment.

Enforce: AI runtime guardrails

Visibility alone isn’t zero trust. Atlas enforces real-time guardrails through an AI Gateway in the request path, inspecting prompts, responses, and agent actions before they reach models or downstream systems. These controls block sensitive data exposure and unsafe behavior—without requiring changes to underlying applications.

Because Atlas understands execution flow and tool chains, it goes beyond keyword filtering to stop indirect leakage and tool-chaining attacks, like those outlined in Anthropic’s framework.

Govern: AI compliance and third-party risk

Anthropic emphasizes compliance alignment. Atlas operationalizes it. Atlas maps AI systems to frameworks like the EU AI Act and NIST AI RMF with audit-ready evidence from live activity, posture findings, and runtime logs.

Zero Trust also extends beyond internal systems. Atlas continuously assesses third-party AI vendors, combining inventories, questionnaires, and AI Bills of Materials to identify and manage external risk.

Monitor: AI activity monitoring and detection and response

Anthropic highlights observability as foundational. Atlas provides full visibility into AI behavior in production, capturing prompts, responses, agent actions, and data access.

AI Detection & Response identifies unsafe or malicious behavior in real time and takes action: alerting, blocking, and integrating with SIEM and SOAR workflows to enable machine-speed response.

Test: AI pen testing

Agents are dynamic. Once an agent is in the wild, gaps emerge even with well-thought out controls.

Atlas continuously tests AI systems with adversarial prompts and real-world attack simulations, including prompt injection and jailbreaks. Results feed directly into guardrails and policies, closing the loop from testing to protection.

Zero Trust for AI agents requires data context

One thing Anthropic's framework necessarily leaves to implementers: the data layer. The framework addresses agent behavior, identity, and access control, but AI security without data security leaves the biggest risk vector unaddressed.

An agent can pass every Zero Trust control — authenticated, authorized, scoped, monitored — and still quietly access four million customer records because the data underneath is overexposed.

Because Atlas is built on the Varonis Data Security Platform, it brings data context that standalone AI security tools can’t match. Posture assessment with real data context. Guardrails informed by classification. Monitoring enriched with identity and sensitivity. Compliance evidence that includes data lineage, not just AI system metadata.

Zero Trust for AI agents is a strong framework. Enforcing it requires securing both AI and the data that powers it.

What should I do now?

Below are three ways you can continue your journey to reduce data risk at your company:

Schedule a demo with us to see Varonis in action. We'll personalize the session to your org's data security needs and answer any questions.

See a sample of our Data Risk Assessment and learn the risks that could be lingering in your environment. Varonis' DRA is completely free and offers a clear path to automated remediation.

Follow us on LinkedIn, YouTube, and X (Twitter) for bite-sized insights on all things data security, including DSPM, threat detection, AI security, and more.

Nolan Necoechea Nolan Necoechea is a product marketing strategist at Varonis. He has spent more than a decade working with data and AI innovators.