New since the original diagram: A2A peer clients (inbound), A2A Peer Agents (outbound), the A2A Proxy in Gateway Core, and the Redis + Qdrant semantic cache layer.
/request flow/
A single prompt, end to end
Every call — chat completion, MCP tool invocation, skill execution, or A2A agent delegation — takes the same governed path through the Rust proxy. Each stage is policy-enforced, observable and tenant-scoped.
1
Client
Portal, agent, backend or A2A peer
2
Identify
JWT & tenant resolved
3
Authorize
Policy & guardrails
4
Cache
Semantic lookup
5
Route
LLM · MCP · Skill · A2A
6
Provider
Stream & record
Step 1
OpenAI, MCP or A2A request
Bearer token or API key
X-Tenant-ID or A2A path
Step 2
Verify JWT signature
Resolve tenant + agent card
Hydrate user / peer context
Step 3
Model entitlement check
Rate limit & quota
Prompt & PII guardrails
Step 4
Embed & search Redis
Skip when tools attached
HIT → return early
Step 5
Pick provider & deployment
Inject creds from vault
Apply request transforms
Step 6
Stream tokens or tool result
Store cache entry
Emit traces & metrics
Cache hitWhen the semantic cache resolves at step 4, the proxy answers immediately — skipping steps 5 and 6 — while still emitting an audit record. Requests carrying tools bypass cache entirely (lookup and store), preserving correctness for tool-using flows.
Step 5 routes to one of four targets
Each branch reuses the same auth, policy, cache and audit layers.
LLM completion OpenAI-compatible
Routes to OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI, Mistral or self-hosted models. Streaming preserved end-to-end.
MCP tool call JSON-RPC
Resolves the configured MCP server for the tenant, attaches user-delegated OAuth tokens and streams the result back to the agent.
Skill execution Composed
Runs a versioned skill: combines prompts, RAG over the KB, and tool calls under a single policy boundary.
A2A delegation A2A v1.0
Calls a Brutor-hosted Agent Card or a peer A2A agent. Adds signed root, parent and depth headers and enforces the policy-defined max delegation depth.
Return path
Response streams back through the same chain. The proxy writes audit + usage to PostgreSQL, correlates every A2A hop against the same root task ID, exports OpenTelemetry traces, and updates Prometheus metrics — before the client ever sees the final token.
Pass-through stage
Decision stage (can short-circuit)
All stages run inside the Rust AI Proxy — one process, one trace.