Research · Agent Attack Surface

Prompt injection is only the first failure mode.

Modern AI agents read context, call tools, update memory, use MCP servers, browse the web, write code, and trigger workflows. The missing layer is a Context Firewall between what they read and what they do.

Get the Context Boundary Checklist Book Firewall Review

Then vs now

The model you secured is not the model you shipped.

Context is now an attack surface. Every retrieved document, memory item, tool response, and MCP description can become instruction.

The old model

User prompt→LLM→response

One input, one output. Guardrails on the prompt could catch most of the obvious abuse.

The agentic model

Goal→plan→retrieve context→read memory→call tools→observe result→update state→act again

Every step in the loop reads untrusted content and can take a real action. The attack surface is the whole workflow, not the prompt.

Attack classes

The failure modes that live past the prompt.

Prompt injection is the entry point, not the whole threat model. These are the classes we replay against real agent workflows.

Attack class

Example

Indirect prompt injection

Malicious instructions hidden in docs, tickets, web pages, emails, or MCP responses.

Tool hijacking

The agent is manipulated into calling a tool with dangerous arguments.

Memory poisoning

Bad context persists and steers future sessions.

RAG poisoning

Retrieved content becomes instruction instead of evidence.

Confused deputy

The agent uses a privileged identity for an attacker-controlled goal.

Output handling abuse

Model output becomes shell, SQL, code, markdown, or workflow config.

OAST exfiltration

The agent sends sensitive data to an external callback.

Human approval spoofing

The agent misrepresents a dangerous action as safe.

Built for teams searching: prompt injection protection · AI agent security · MCP security · agentic AI red teaming · context firewall · LLM firewall.

Stop treating agent security as a prompt problem.

Guardrails catch the obvious. Ultra13 controls the context-to-action boundary.

Book Firewall Review See a Sample Report