Brutor AI Control Plane Architecture

Two complementary views — the components that make up the platform, and how a single request flows through them end to end.

/components/

What’s in the platform

Clients

User Portal

React · TypeScript

ChatMCP DiscoveryModel Discovery Batch ProcessingUsage AlertsGuardrails Alerts

AI Clients

External chat & dev tools

GooseClaude Code

A2A Peers

External agents calling in

Signed Agent CardJWT + tenant path

Admin Console

React · TypeScript

Configure resourcesAllocate to groupsGuardrails setupView logs

Platform Core

Gateway Core

Rust · Axum · Tokio · Tower

Auth & IdentityLLM ProxyLLM Routing MCP ProxyOAuth ProxyA2A Proxy Resource GroupsGuardrailsUsage Tracking Usage Limit CheckProxy LoggingMetrics Collection Batch ProcessingCache LookupCache Store

Skills (System MCP Server)

SKILL.md · L1/L2/L3 progressive disclosure · versioning & publishing · scripts execute in the sandboxed Skill Runner — an isolated container, never in-process

Gateway Control Plane

Python · FastAPI

Tenant ConfigProvider MgmtRBAC + Users Guardrail PolicyResource GroupsMCP Registry

AI Services

AI Models

300 across providers

OpenAIAnthropicGoogleAzureMistralOllama

MCP Servers

JSON-RPC · SSE · Streamable HTTP

GitHubHubSpotSlack+ any MCP server

A2A Peer Agents

External agents this gateway calls

HMAC delegationStreaming + tasks

External stores · customer-deployed

PostgreSQL

Shared data store — configuration + audit + usage

tenants · api_keys · llm_models · mcp_servers · guardrail_policies · resource_groups · agent_skills · oauth_clients · oauth_tokens · proxy_logs · usage

Redis + Qdrant

Semantic response cache — embedding lookup + early return on hit

embedding store · cache index · bypass when tools attached · HIT short-circuits steps 5 & 6

Observability Stack

Prometheus · Grafana · Loki · OpenTelemetry Collector

:9090 Prometheus · :3000 Grafana · :3100 Loki · rate(proxy_requests_total[5m]) · proxy_active_streams · proxy_guardrail_blocks_total · proxy_llm_estimated_cost_usd

/request flow/

A single prompt, end to end

Every call — chat completion, MCP tool invocation, skill execution, or A2A agent delegation — takes the same governed path through the Rust proxy. Each stage is policy-enforced, observable and tenant-scoped.

Client

Portal, agent, backend or A2A peer

Identify

JWT & tenant resolved

Authorize

Policy & guardrails

Cache

Semantic lookup

Route

LLM · MCP · Skill · A2A

Provider

Stream & record

Step 1

OpenAI, MCP or A2A request
Bearer token or API key
X-Tenant-ID or A2A path

Step 2

Verify JWT signature
Resolve tenant + agent card
Hydrate user / peer context

Step 3

Model entitlement check
Rate limit & quota
Prompt & PII guardrails

Step 4

Embed & search Qdrant
Skip when tools attached
HIT → return early

Step 5

Pick provider & deployment
Inject creds from vault
Apply request transforms

Step 6

Stream tokens or tool result
Store cache entry
Emit traces & metrics

Cache hit When the semantic cache resolves at step 4, the proxy answers immediately — skipping steps 5 and 6 — while still emitting an audit record. Requests carrying tools bypass cache entirely (lookup and store), preserving correctness for tool-using flows.

Step 5 routes to one of four targets

Each branch reuses the same auth, policy, cache and audit layers.

LLM completion OpenAI-compatible

Routes to OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI, Mistral or self-hosted models. Streaming preserved end-to-end.

MCP tool call JSON-RPC

Resolves the configured MCP server for the tenant, attaches user-delegated OAuth tokens and streams the result back to the agent.

Skill execution Composed

Progressive disclosure over the system MCP server: list → load → act. Scripts run in the external sandboxed Skill Runner (read-only rootfs, JWT-authed); every layer is independently governed and audited.

A2A delegation A2A v1.0

Calls a Brutor-hosted Agent Card or a peer A2A agent. Adds signed root, parent and depth headers and enforces the policy-defined max delegation depth.

Return path Response streams back through the same chain. The proxy writes audit + usage to PostgreSQL, correlates every A2A hop against the same root task ID, exports OpenTelemetry traces, and updates Prometheus metrics — before the client ever sees the final token.

Pass-through stage

Decision stage (can short-circuit)

All stages run inside the Rust AI Proxy — one process, one trace.