Features
The runtime is the part
you don't want to build twice.
Three pillars. Each is one thing demos handle and production doesn't. We built the production half so your team can ship the agentic experience instead.
Knows your data
Answers from public docs.
Reads tenant data through your APIs, scoped to the customer's org. Never crosses boundaries.
Every call carries a tenant ID. The runtime enforces isolation at the API layer — even if the agent goes off-script.
BYO model keys. Customers can bring their own OpenAI / Anthropic / Bedrock account; we never see the keys.
The agent reads your customer's schema at runtime — column names, types, foreign keys — to write queries that actually work.
Row-level filters applied before the agent sees data. Even an "all rows" query returns the tenant's rows only.
Acts in your product
Returns a string.
Issues refunds, updates records, opens tickets — through your APIs, with approval flows for high-stakes actions.
The agent wants to issue a $847 refund to customer_4f2a.
reason: order #38291 never shipped policy: §4.2 (over $500 → human review) tenant: acme · invoked by: agent.refund-flow
Declare your APIs as tools. Tavora handles the calling loop, retries, and structured-output parsing.
Any tool can require human approval over a threshold. The agent pauses; an operator confirms; the run continues.
Every tool call carries a key. Retried calls don't double-charge, double-refund, or double-create.
When the model writes JavaScript instead of calling tools, it runs in a hardened isolate with no network and a memory cap.
Judgable, not vibes
Looks right.
Shows the JS it ran, the tools it called, the data it touched. Replay any turn. Eval-gated before deploy.
Click a production turn; see the exact plan, tool calls, model outputs, and data the agent saw. Step through it.
Define eval cases in Studio. Run them in CI. Block deploys when accuracy drops below the bar you set.
Sample production traffic, run it through your evals, alert when the live distribution drifts from the eval set.
Every plan, tool call, and data access is logged per-tenant. Searchable. Exportable. SOC 2 ready.
Also in the box
Everything else you'd build
before launch.
The boring infrastructure underneath. None of it is interesting on its own. All of it is required to put an agent in front of customers.
Token-by-token streaming with proper cancellation. Stop a 30-second run mid-flight without leaking tool calls.
Cap usage per customer org so a single tenant can't run up your bill or starve another.
Route fast paths to small models, hard cases to big ones. Or pin a customer to their preferred provider.
Schema-validated JSON responses with retries on parse failure. The model gets the diff and tries again.
Per-user, per-tenant, per-conversation memory. Pluggable backends (Postgres, Redis, your own).
First-class clients for both. Same surface area, same primitives, same docs.
Run Tavora in our cloud or yours. Same binary. Same Studio. Bring-your-own-VPC available.
Every plan, tool call, and model invocation emits OTel spans. Drop them into your existing observability stack.
Define agents as code. Version them. Diff them. Roll back. The way you ship the rest of your infra.
Read the docs, or just open Studio.
The fastest way to understand the runtime is to define an agent in Studio and click Run.