Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.xenovia.io/llms.txt

Use this file to discover all available pages before exploring further.

Setup

pip install llama-index-llms-openai llama-index-embeddings-openai
import os
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

llm = OpenAI(
    model="gpt-4o-mini",
    api_key=os.environ["XENOVIA_API_KEY"],
    api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1"
)

Settings.llm = llm
Setting Settings.llm globally applies to all LlamaIndex components that use an LLM — query engines, chat engines, agents — without further configuration.

Embeddings

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    api_key=os.environ["XENOVIA_API_KEY"],
    api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1"
)

Settings.embed_model = embed_model
Every embedding call routes through Xenovia independently. Policies and traces apply to both LLM and embedding calls — the full RAG pipeline is governed, not just the generation step.

RAG pipeline

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("What are the main topics covered?")
print(response)

Agentic query engine

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool

tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="docs_search",
    description="Search internal documentation"
)

agent = ReActAgent.from_tools([tool], llm=llm, verbose=True)
response = agent.chat("Find information about governance policies")
Each reasoning step (think, act, observe) is a separate LLM call. All calls route through Xenovia and produce individual traces. Use a consistent X-Xenovia-Session-Id to group the full agent run in Traces.

Session tracking

Pass the same session header into both the LLM and embedding clients so the full workflow stays grouped in Traces:
import uuid

session_id = str(uuid.uuid4())

llm = OpenAI(
    model="gpt-4o-mini",
    api_key=os.environ["XENOVIA_API_KEY"],
    api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1",
    default_headers={"X-Xenovia-Session-Id": session_id}
)

embed_model = OpenAIEmbedding(
    api_key=os.environ["XENOVIA_API_KEY"],
    api_base=f"https://runtime.xenovia.io/a/{os.environ['XENOVIA_PROXY_ID']}/openai/v1",
    default_headers={"X-Xenovia-Session-Id": session_id}
)

Handling policy blocks

When a request is blocked, LlamaIndex propagates the upstream 403 as an openai.PermissionDeniedError.
from openai import PermissionDeniedError

try:
    response = query_engine.query("Drop the database")
except PermissionDeniedError as e:
    print(f"Blocked by policy: {e.message}")