Documentation Index
Fetch the complete documentation index at: https://docs.xenovia.io/llms.txt
Use this file to discover all available pages before exploring further.
Setup
pip install llama-index-llms-openai llama-index-embeddings-openai
import os
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
llm = OpenAI(
model="gpt-4o-mini",
api_key=os.environ["XENOVIA_API_KEY"],
api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1"
)
Settings.llm = llm
Setting Settings.llm globally applies to all LlamaIndex components that use an LLM — query engines, chat engines, agents — without further configuration.
Embeddings
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(
api_key=os.environ["XENOVIA_API_KEY"],
api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1"
)
Settings.embed_model = embed_model
Every embedding call routes through Xenovia independently. Policies and traces apply to both LLM and embedding calls — the full RAG pipeline is governed, not just the generation step.
RAG pipeline
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What are the main topics covered?")
print(response)
Agentic query engine
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool
tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="docs_search",
description="Search internal documentation"
)
agent = ReActAgent.from_tools([tool], llm=llm, verbose=True)
response = agent.chat("Find information about governance policies")
Each reasoning step (think, act, observe) is a separate LLM call. All calls route through Xenovia and produce individual traces. Use a consistent X-Xenovia-Session-Id to group the full agent run in Traces.
Session tracking
Pass the same session header into both the LLM and embedding clients so the full workflow stays grouped in Traces:
import uuid
session_id = str(uuid.uuid4())
llm = OpenAI(
model="gpt-4o-mini",
api_key=os.environ["XENOVIA_API_KEY"],
api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1",
default_headers={"X-Xenovia-Session-Id": session_id}
)
embed_model = OpenAIEmbedding(
api_key=os.environ["XENOVIA_API_KEY"],
api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1",
default_headers={"X-Xenovia-Session-Id": session_id}
)
Handling policy blocks
When a request is blocked, LlamaIndex propagates the upstream 403 as an openai.PermissionDeniedError.
from openai import PermissionDeniedError
try:
response = query_engine.query("Drop the database")
except PermissionDeniedError as e:
print(f"Blocked by policy: {e.message}")