Want to offer AI governance under your own brand? Explore partnership models →

Brutor AI Gateway — Architecture

Brutor AI Governance Platform Architecture

Two complementary views — the components that make up the platform, and how a single request flows through them end to end.

/components/

What’s in the platform

Brutor AI Governance Platform
AI Gateway · MCP Gateway · Skills · A2A · Observability · Security
Clients
User Portal
React · TypeScript
ChatMCP DiscoveryModel Discovery Batch ProcessingUsage AlertsGuardrails Alerts
AI Clients
External chat & dev tools
GooseClaude Code
A2A Peers A2A
External agents calling in
Signed Agent CardJWT + tenant path
Admin Console
React · TypeScript
Configure resourcesAllocate to groupsGuardrails setupView logs
Platform Core
Gateway Core
Rust · Axum · Tokio · Tower
Auth & IdentityLLM ProxyLLM Routing MCP ProxyOAuth ProxyA2A Proxy Resource GroupsGuardrailsUsage Tracking Usage Limit CheckProxy LoggingMetrics Collection Batch ProcessingCache LookupCache Store
Skills (System MCP Server)
SKILLS.md · scripts · assets · resources · workflow support · versioning & publishing
Gateway Control Plane
Python · FastAPI
Tenant ConfigProvider MgmtRBAC + Users Guardrail PolicyResource GroupsMCP Registry
AI Services
AI Models
300 across providers
OpenAIAnthropicGoogleAzureMistralOllama
MCP Servers
JSON-RPC · SSE · Streamable HTTP
GitHubHubSpotSlack+ any MCP server
A2A Peer Agents A2A
External agents this gateway calls
HMAC delegationStreaming + tasks
External stores · customer-deployed
PostgreSQL
Shared data store — configuration + audit + usage
tenants · api_keys · llm_models · mcp_servers · guardrail_policies · resource_groups · agent_skills · oauth_clients · oauth_tokens · proxy_logs · usage
Redis + Qdrant Cache
Semantic response cache — embedding lookup + early return on hit
embedding store · cache index · bypass when tools attached · HIT short-circuits steps 5 & 6
Observability Stack
Prometheus · Grafana · Loki · OpenTelemetry Collector
:9090 Prometheus · :3000 Grafana · :3100 Loki · rate(proxy_requests_total[5m]) · proxy_active_streams · proxy_guardrail_blocks_total · proxy_llm_estimated_cost_usd

New since the original diagram: A2A peer clients (inbound), A2A Peer Agents (outbound), the A2A Proxy in Gateway Core, and the Redis + Qdrant semantic cache layer.


/request flow/

A single prompt, end to end

Every call — chat completion, MCP tool invocation, skill execution, or A2A agent delegation — takes the same governed path through the Rust proxy. Each stage is policy-enforced, observable and tenant-scoped.

1

Client

Portal, agent, backend or A2A peer

2

Identify

JWT & tenant resolved

3

Authorize

Policy & guardrails

4

Cache

Semantic lookup

5

Route

LLM · MCP · Skill · A2A

6

Provider

Stream & record

Step 1
  • OpenAI, MCP or A2A request
  • Bearer token or API key
  • X-Tenant-ID or A2A path
Step 2
  • Verify JWT signature
  • Resolve tenant + agent card
  • Hydrate user / peer context
Step 3
  • Model entitlement check
  • Rate limit & quota
  • Prompt & PII guardrails
Step 4
  • Embed & search Redis
  • Skip when tools attached
  • HIT → return early
Step 5
  • Pick provider & deployment
  • Inject creds from vault
  • Apply request transforms
Step 6
  • Stream tokens or tool result
  • Store cache entry
  • Emit traces & metrics
Cache hit When the semantic cache resolves at step 4, the proxy answers immediately — skipping steps 5 and 6 — while still emitting an audit record. Requests carrying tools bypass cache entirely (lookup and store), preserving correctness for tool-using flows.

Step 5 routes to one of four targets

Each branch reuses the same auth, policy, cache and audit layers.
LLM completion OpenAI-compatible

Routes to OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI, Mistral or self-hosted models. Streaming preserved end-to-end.

MCP tool call JSON-RPC

Resolves the configured MCP server for the tenant, attaches user-delegated OAuth tokens and streams the result back to the agent.

Skill execution Composed

Runs a versioned skill: combines prompts, RAG over the KB, and tool calls under a single policy boundary.

A2A delegation A2A v1.0

Calls a Brutor-hosted Agent Card or a peer A2A agent. Adds signed root, parent and depth headers and enforces the policy-defined max delegation depth.

Return path Response streams back through the same chain. The proxy writes audit + usage to PostgreSQL, correlates every A2A hop against the same root task ID, exports OpenTelemetry traces, and updates Prometheus metrics — before the client ever sees the final token.
Pass-through stage
Decision stage (can short-circuit)
All stages run inside the Rust AI Proxy — one process, one trace.
Scroll to Top