Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.xenovia.io/llms.txt

Use this file to discover all available pages before exploring further.

Plugin pipeline

Xenovia Runtime is a high performance Go-based LLM proxy. Every request passes through six plugins in sequence. Each plugin implements a PreLLMHook (before the upstream call) and a PostLLMHook (after the response).
Request

  ├─ 1. Auth           verify xe_... key, resolve proxy identity from Redis
  ├─ 2. ProviderRoute  rewrite provider field based on proxy config
  ├─ 3. Session        resolve or create session UUID, increment turn counter
  ├─ 4. Trace          open trace record, stamp session/trace headers
  ├─ 5. Policy         evaluate Rego against request context (OPA, 200ms timeout)
  └─ 6. Intent         score against proxy intent definition if trigger fires


    Upstream LLM (OpenAI / Anthropic / Gemini / Azure / Bedrock / Groq / vLLM)

  ├─ 6. Intent         PostLLMHook (no-op for intent)
  ├─ 5. Policy         response-stage Rego evaluation
  ├─ 4. Trace          persist completed trace asynchronously
  ├─ 3. Session        store chain fingerprint, response chain mapping
  ├─ 2. ProviderRoute  (no-op)
  └─ 1. Auth           (no-op)


    Response to agent
A session Middleware wraps the entire router at the fasthttp level and stamps X-Xenovia-Session-Id and X-Xenovia-Trace-Id into the response headers before the first body byte. This is required for streaming responses — the headers must be sent before the stream opens.

Plugin details

1. Auth

  • Accepts Authorization: Bearer xe_... or X-Xenovia-Key: xe_....
  • Resolves the key against Redis (apikey:{key}, 5-minute TTL). On cache miss, calls the control plane POST /api/v1/internal/auth/verify.
  • The resolved identity is an HMAC-signed blob containing proxy_id and org_id.
  • X-Xenovia-Agent-Path header validation prevents cross-proxy key use: the resolved proxy ID must match the path segment.
  • Keys are never logged; only an 8-character SHA-256 hex prefix appears in logs.

2. Provider routing

  • Rewrites the request’s provider field to match the proxy’s upstream configuration.
  • Two rewrite paths:
    • vLLM: OpenAI-format request + base_url → vLLM provider with SSRF-safe endpoint validation.
    • Cloud: OpenAI-format request → Anthropic, Gemini, Azure, or Bedrock with automatic format translation.
  • SSRF protection: link-local IPs and cloud metadata endpoints are blocked in vLLM base_url values.
  • Provider credentials are held in the proxy configuration and resolved from the control plane. Your application never needs provider API keys.

3. Session

Five-strategy resolution chain (evaluated in priority order):
PrioritySourceMechanism
1X-Xenovia-Session-Id headerMust be a valid UUID; validated before use
2previous_response_id (Responses API)Looked up via respchain:{keyHash}:{resp_id} in Redis
3user field (Chat Completions)Looked up via usersession:{keyHash}:{userHash} in Redis
4Message fingerprintSHA-256 of messages[:-1]; looked up via chain:{keyHash}:{hash}
5New sessionGenerates a fresh UUID
Turn count is incremented atomically (INCR sessionturn:{session_id}) with a 30-minute sliding TTL. Sessions are proxy-scoped; cross-proxy session hijacking is prevented by ownership verification.

4. Trace

  • Opens a trace record with session context at PreLLMHook.
  • Emits child trace steps: request_received, llm, policy, intent, escalation, tool (call + result), request_finished.
  • Each step has its own trace ID linked to the parent via parent_trace_id.
  • Custom properties from X-Xenovia-Property-* headers are attached to the trace (max 20 properties; keys ≤ 64 chars; values ≤ 512 chars; policy_ prefix is reserved).
  • Async persistence via a bounded goroutine pool (256 concurrent). A sync.Once dedup guard prevents duplicate rows under concurrent streaming hooks.
  • Captures: request/response bodies (truncated), tokens, latency, TTFT, tool calls/results, session turn, policy decision, intent score.
  • Optional direct ClickHouse write in addition to control plane persistence.
Trace-related headers returned in every response:
HeaderValue
X-Xenovia-Session-IdResolved session UUID
X-Xenovia-Trace-IdPer-request trace UUID
Optional request headers for trace enrichment:
HeaderDescription
X-Xenovia-Property-{key}Custom trace property (key ≤ 64, value ≤ 512, no policy_ prefix)
X-Xenovia-Session-PathHierarchical path tag for trace grouping
X-Xenovia-Parent-Trace-IdParent trace UUID for cross-request linkage
X-Xenovia-Trace-Flow-IdFlow-level grouping UUID

5. Policy

  • Rego policies are fetched per proxy from the control plane (GET /api/v1/internal/proxies/{id}/policies) and cached in Redis for 5 minutes.
  • Two independent policies per proxy: request-stage and response-stage.
  • OPA evaluates policies with a 2-second compile timeout and a 200-millisecond eval timeout. Compile results are cached by (agentID, policyHash) using singleflight to prevent duplicate compiles under load.
  • Optional HMAC-signed policies: stored as v1:<hmac_hex>:<rego> in Redis; signature verified before use.
  • Response-stage policy failures fail open (logged, counted with an atomic counter).
See Policies and Approvals for the full input schema and Rego examples.

6. Intent

  • Intent configuration fetched per proxy: intent text, capability list, and a semantic trigger.
  • Trigger axes:
    • turn_scope: first_only, first_n, all (default)
    • on_tools: always (default), require, ignore
    • min_content_chars: minimum message length to score
    • sample_rate: 0.0–1.0 for probabilistic scoring
  • Scoring request sent to guardrail service (POST {GUARDRAIL_URL}/score) or control plane fallback, with a 15-second timeout. Payloads are truncated to 4096 characters per field.
  • Actions: allow (pass through), block (403), escalate (403 + async operator notification).
  • The block/escalate reason is never forwarded to the agent — it is logged server-side only.
  • Fail mode: XENOVIA_INTENT_FAIL_MODE=open (default, fail open) or closed (503 on scoring errors).

Supported providers

ProviderFormat
OpenAINative
AnthropicAuto-translated from OpenAI format
Google GeminiAuto-translated from OpenAI format
Azure OpenAIAuto-translated from OpenAI format
Amazon BedrockAuto-translated from OpenAI format
GroqOpenAI-compatible
vLLM (self-hosted)OpenAI-compatible

Supported endpoints

EndpointUse case
POST /v1/chat/completionsChat, agents, tool calling
POST /v1/responsesOpenAI Agents SDK (Responses API)
POST /v1/embeddingsRAG, vector search
POST /v1/completionsLegacy text completions
Audio, image, video, file, batch, and container endpoints are blocked at the routing layer. Only text inference and health endpoints are permitted.

Runtime environment variables

VariableRequiredDefaultDescription
CONTROL_PLANE_URLYesControl plane base URL
RUNTIME_SHARED_SECRETYesSent as X-Runtime-Secret on internal CP calls
PORTNo8080HTTP listen port
REDIS_URLNoredis://localhost:6379Redis connection URL
CLICKHOUSE_URLNounsetClickHouse for direct trace writes
GUARDRAIL_URLNounsetGuardrail scoring service URL
GUARDRAIL_SECRETNounsetGuardrail service auth token
XENOVIA_INTENT_FAIL_MODENoopenopen or closed
XENOVIA_POLICY_SIGNING_KEYNounsetHMAC key for signed Rego policies
XENOVIA_IDENTITY_SIGNING_KEYNoRUNTIME_SHARED_SECRETHMAC key for identity blobs