Anthropic publishes the Zero Trust guide to protect corporate AI agents
6/8/2026, 07:49 AM • Евгения Слив

Anthropic has published "Zero Trust for AI agents," a practical framework for securely deploying autonomous AI agents in enterprise environments. Drawing on NIST SP 800-207 standards and 2026 NSA guidelines, the document adopts a "never trust, always verify" principle: every agent action must be validated, and architecture should assume potential compromise. Key threats identified include direct and indirect prompt injections, tool substitution, privilege abuse, context poisoning, and supply chain attacks.
A three-tier maturity model is proposed for protection. At the baseline level, each agent should receive a unique cryptographic identity, short-lived tokens, and operate under "default deny" with least-privilege access. For agents processing untrusted content, sandboxed execution is deemed essential. Advanced tiers recommend mTLS with mutual authentication, hardware-bound identity via HSM/TPM, and remote attestation. Static API keys and shared service account passwords are considered unacceptable even at the entry level.
Observability receives special emphasis: Anthropic advises comprehensive logging of all agent actions – tool calls, data access, external communications – with real-time SIEM correlation. For critical systems, anomaly detection should occur within one hour. The company also recommends building a "traceability matrix" linking each agent action to its originating request. Regarding incident response, automation should handle artifact collection and draft reporting, but key decisions – containment, disclosure, customer communication – should remain with humans. As Anthropic notes, organizations with the strongest foundational security architecture, not necessarily the most advanced AI, will be best positioned.
