Ultra13
Solutions · Security Teams

Prove your AI agents stay inside policy under attack.

Ultra13 combines adversarial agent testing, runtime context enforcement, and post-control validation so security teams can govern AI agents without blocking product teams.

Attack surface

The attack surface is the workflow.

Agent risk does not live in the prompt. It lives in the reasoning, memory, tools, identity, RAG, MCP, and inter-agent communication that make the agent useful in the first place.

Reasoning

the plan-act loop itself can be steered

Memory

stored context becomes standing instruction

Tools

function calls with real side effects

Identity

privileged credentials the agent can borrow

RAG

retrieved chunks that read as instruction

MCP

external servers and their tool schemas

Inter-agent

messages passed between cooperating agents

Governance

Four questions Ultra13 answers.

A security review of an agent is not a checklist of model settings. It is a clear answer to what the agent can reach, what can steer it, what it can do, and whether the controls hold.

01

What can this agent access?

Every identity, credential, tool, data store, and retrieval source the agent can reach — mapped, not assumed.

02

What can manipulate it?

Every untrusted input path: users, documents, web pages, email, RAG, memory, MCP responses, and other agents.

03

What can it do?

The full blast radius of its tools and actions — exports, writes, payments, code execution, and downstream calls.

04

What happens when hostile context enters the loop?

We inject it. Poisoned memory, malicious tools, and indirect injection are replayed against the live agent.

05

Can we prove the controls work?

Before/after exploit replay and blocked-action logs that show each path is closed — evidence an auditor can read.

Attack coverage

Attack coverage.

We run a transparent catalogue of agentic attack families against the live agent loop — hostile users, poisoned context, malicious tools, and real exploit chains.

attack catalogue
Direct prompt injectionIndirect injection (tool / document)JailbreaksSystem-prompt extractionRAG poisoningMemory poisoningInsecure output handlingTool hijackCommand executionSSRFOAST exfiltrationConfused deputyConsent / approval spoofingCross-agent contaminationCascading multi-agent failureMCP rug-pull / driftTool shadowingMultimodal injectionModel extractionCross-tenant data bleedUnbounded consumption / DoSObfuscation evasion
Runtime controls

Enforceable runtime controls.

Findings are only half the job. Each one maps to a control that holds at runtime — enforced at context boundaries, not asked for in a system prompt.

01

Context provenance

Every span tagged with where it came from — user, document, web, RAG, memory, MCP, tool output, or another agent.

02

Source/sink enforcement

Decide which context classes may influence which actions. Web content can answer; it cannot authorize an export.

03

Tool authorization

Inspect tool name, arguments, target resource, identity, and tenant before anything executes.

04

Memory quarantine

Isolate untrusted memory so poisoned context never becomes standing instruction or implicit authorization.

05

Sensitive-data redaction

Strip secrets and regulated data before they reach a model, a log, or an outbound call.

06

Approval gates

Require human approval for high-blast-radius actions, with the real action shown — not a paraphrase.

07

Action logs

Every decision — allow, block, redact, quarantine, approve — captured as a replayable record.

08

Tenant isolation

Enforce boundaries so one tenant's context or memory can never reach another's reasoning or tools.

Evidence

Audit-ready evidence.

Every engagement produces an evidence pack a security team can hand to a board, an auditor, or a customer — no translation required.

Before/after exploit replay

The same attack run against the agent before and after controls, side by side.

Severity map

Each finding rated by impact and likelihood, mapped to the surface it abuses.

Policy diff

The exact source-to-sink and tool-authorization changes that closed each path.

Blocked-action logs

Machine records of every attempt the runtime stopped, with full call context.

Residual-risk register

What remains open, why, and the compensating control or accepted-risk decision.

Validation status

Pass/fail per attack family after controls, so re-tests are unambiguous.

Executive summary

A one-page account a board, auditor, or customer can read without translation.

exploit replay · sample
exploit replayconfused-deputy
before firewall

agent uses privileged identity for attacker goal

exploit succeeded
after firewall

cross-identity action denied

BLOCK

Each finding ships with the same artifact: the exploit succeeding before the control, the action contained after it, and the runtime decision that closed the path — replayable on demand.

note · OWASP

Prompt injection can lead to sensitive information disclosure, unauthorized function access, arbitrary command execution in connected systems, and manipulation of critical decisions. RAG and fine-tuning reduce the surface but do not fully mitigate it — which is why enforcement belongs at runtime, not in the model.

Turn agentic AI risk into evidence your board, auditors, and customers can understand.

We attack one high-risk agent workflow, enforce the controls at runtime, and prove the exploit path is closed.