LlamaIndex

Setup

pip install llama-index-llms-openai llama-index-embeddings-openai

import os
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

llm = OpenAI(
    model="gpt-4o-mini",
    api_key=os.environ["XENOVIA_API_KEY"],
    api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1"
)

Settings.llm = llm

Setting Settings.llm globally applies to all LlamaIndex components that use an LLM — query engines, chat engines, agents — without further configuration.

Embeddings

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    api_key=os.environ["XENOVIA_API_KEY"],
    api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1"
)

Settings.embed_model = embed_model

Every embedding call routes through Xenovia independently. Policies and traces apply to both LLM and embedding calls — the full RAG pipeline is governed, not just the generation step.

RAG pipeline

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("What are the main topics covered?")
print(response)

Agentic query engine

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool

tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="docs_search",
    description="Search internal documentation"
)

agent = ReActAgent.from_tools([tool], llm=llm, verbose=True)
response = agent.chat("Find information about governance policies")

Each reasoning step (think, act, observe) is a separate LLM call. All calls route through Xenovia and produce individual traces. Use a consistent X-Xenovia-Session-Id to group the full agent run in Traces.

Session tracking

Pass the same session header into both the LLM and embedding clients so the full workflow stays grouped in Traces:

import uuid

session_id = str(uuid.uuid4())

llm = OpenAI(
    model="gpt-4o-mini",
    api_key=os.environ["XENOVIA_API_KEY"],
    api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1",
    default_headers={"X-Xenovia-Session-Id": session_id}
)

embed_model = OpenAIEmbedding(
    api_key=os.environ["XENOVIA_API_KEY"],
    api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1",
    default_headers={"X-Xenovia-Session-Id": session_id}
)

Handling policy blocks

When a request is blocked, LlamaIndex propagates the upstream 403 as an openai.PermissionDeniedError.

from openai import PermissionDeniedError

try:
    response = query_engine.query("Drop the database")
except PermissionDeniedError as e:
    print(f"Blocked by policy: {e.message}")

Start Here

Integrations

Platform

Setup

Embeddings

RAG pipeline

Agentic query engine

Session tracking

Handling policy blocks

Start Here

Integrations

Platform

Documentation Index

​Setup

​Embeddings

​RAG pipeline

​Agentic query engine

​Session tracking

​Handling policy blocks

Setup

Embeddings

RAG pipeline

Agentic query engine

Session tracking

Handling policy blocks