Skip to main content

Configuration & self-hosting

AgentData reads non-secret settings from config.yaml, and environment variables override it and hold all secrets. This page is the reference for self-hosted deployments.

Which key does what

The short version:

  • ANTHROPIC_API_KEY powers every text-LLM call — classification, query planning, learn/teach, translation.
  • Embeddings use a separate key. Anthropic has no embeddings API, so semantic recall needs an OpenAI or Voyage key (or a local embeddings server). The two are unrelated, and embeddings are optional — without them, recall falls back to word-overlap matching.

Swap the cloud LLM for an on-prem/OpenAI-compatible one to keep data in your network — see Security and Self-hosting.

Secrets (set via environment variables)

Env varUsed forRequired?
DATABASE_URLPostgres connectionYes
SECRET_KEYJWT signingYes (prod)
FERNET_KEYEncrypt source credentials/tokens at restYes (prod)
ANTHROPIC_API_KEYAll LLM text callsYes, for any AI feature
EMBEDDING_OPENAI_API_KEYEmbeddings only (semantic recall)Optional
GOOGLE_CLIENT_IDGoogle sign-inOptional
CUBEJS_API_SECRETSign Cube API tokens (federation)If using Cube + Trino
MCP_API_KEYLegacy shared MCP keyOptional (per-user keys / OAuth preferred)
MAIL_SMTP_*, MAIL_FROMInvite / welcome emailOptional

Behaviour & wiring

Env varMeaning
LLM_PROVIDER / LLM_BASE_URL / LLM_API_KEYSwap Anthropic for an OpenAI-compatible or on-prem LLM. Default anthropic.
EMBEDDINGS_PROVIDER / EMBEDDINGS_MODEL / EMBEDDINGS_BASE_URLopenai | voyage | "" (off); a base URL points at a local embeddings server (no key).
EMBEDDING_VOYAGEAI_KEYKey when EMBEDDINGS_PROVIDER=voyage.
CUBE_API_URL / TRINO_URLFederation engine URLs (private hosts).
FEDERATION_ENABLEDtrue/false — Cube + Trino vs the in-process shim.
APP_URL / MCP_PUBLIC_URLPublic UI URL (email links) / public backend URL (OAuth callbacks).
CORS_ORIGINS / REQUIRE_AUTHCORS allowlist; authentication gate (set REQUIRE_AUTH=true in production).
AWS_REGIONBedrock / Athena region, if used.

Graceful degradation

If a feature's key is missing, AgentData degrades rather than crashing: the classifier falls back to heuristics, query_nl returns 503, and embeddings fall back to word-overlap matching. Every LLM call is cost-tracked per tenant.