Skip to content

CI integration

The tavora CLI has two flavours of CI gate:

GateWhat it checksWhen to run
tavora deploy --dry-runThe local tavora/ folder validates against the platform (schema parse, model + index + secret refs resolve, agent-count tier limits).Every PR that touches tavora/**.
tavora rag-eval formats --gateThe RAG pipeline still accepts and indexes every supported document format.After chunker / extractor / migration changes.
tavora rag-eval judge --gateLLM-judges RAG answers against a fixture invoice corpus.On a schedule + after retrieval-side changes.

Per-agent eval suites are advisory — they report a pass/fail score but never block deploy. The Phase 12 promotion gate was removed in the 2026-05-11 slim-down. Run them as a quality signal either pre-deploy (tavora deploy --run-evals) or on demand (tavora evals run <agent>); the gating job is the validate step, not the eval step.

A two-job workflow: validate on every PR, deploy on merge to main.

.github/workflows/agent-deploy.yml
name: agent-deploy
on:
pull_request:
paths:
- 'tavora/**'
push:
branches: [main]
paths:
- 'tavora/**'
jobs:
validate:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
env:
TAVORA_URL: https://api.tavora.ai
TAVORA_API_KEY: ${{ secrets.TAVORA_STAGING_API_KEY }}
steps:
- uses: actions/checkout@v4
- name: Install tavora CLI
run: |
curl -fsSL https://github.com/tavora-ai/tavora-cli/releases/latest/download/tavora-linux-amd64.tar.gz \
| tar -xz -C /usr/local/bin tavora
- name: Validate
run: tavora deploy --dry-run
deploy:
if: github.event_name == 'push'
runs-on: ubuntu-latest
env:
TAVORA_URL: https://api.tavora.ai
TAVORA_API_KEY: ${{ secrets.TAVORA_PRODUCTION_API_KEY }}
steps:
- uses: actions/checkout@v4
- name: Install tavora CLI
run: |
curl -fsSL https://github.com/tavora-ai/tavora-cli/releases/latest/download/tavora-linux-amd64.tar.gz \
| tar -xz -C /usr/local/bin tavora
- name: Deploy
run: tavora deploy --run-evals

--dry-run validates locally + against the server, but persists nothing — perfect for PR feedback. --run-evals on the deploy job fires the per-agent pinned suite after publishing; eval failures surface in the CI log but don’t roll back the deploy.

.gitlab-ci.yml
stages: [validate, deploy]
validate:
stage: validate
image: golang:1.22
variables:
TAVORA_URL: https://api.tavora.ai
TAVORA_API_KEY: $TAVORA_STAGING_API_KEY
script:
- go install github.com/tavora-ai/tavora-cli/cmd/tavora@latest
- tavora deploy --dry-run
only:
refs: [merge_requests]
changes: ['tavora/**/*']
deploy:
stage: deploy
image: golang:1.22
variables:
TAVORA_URL: https://api.tavora.ai
TAVORA_API_KEY: $TAVORA_PRODUCTION_API_KEY
script:
- go install github.com/tavora-ai/tavora-cli/cmd/tavora@latest
- tavora deploy --run-evals
only:
refs: [main]
changes: ['tavora/**/*']

The two rag-eval gates ship pre-built and exercise the runtime end-to-end. Useful on a nightly schedule even when you haven’t changed anything — they catch silent regressions in the underlying provider’s behaviour.

.github/workflows/rag-nightly.yml
on:
schedule:
- cron: '0 6 * * *'
jobs:
rag:
runs-on: ubuntu-latest
env:
TAVORA_URL: https://api.tavora.ai
TAVORA_API_KEY: ${{ secrets.TAVORA_API_KEY }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
steps:
- run: |
curl -fsSL https://github.com/tavora-ai/tavora-cli/releases/latest/download/tavora-linux-amd64.tar.gz \
| tar -xz -C /usr/local/bin tavora
- run: tavora rag-eval formats --gate
- run: tavora rag-eval judge --gate --pass-threshold 7

rag-eval judge provisions a temp store, uploads fixture invoices, asks structured questions, scores via Gemini, then (by default) cleans up. --cleanup=false preserves the store for forensic inspection later.

CodeMeaning
0Validation passed / deploy succeeded / all eval cases passed the threshold.
1Validation failed (schema error, broken ref, fatal --dry-run issue) or at least one eval case failed.
2Run did not complete (timeout, server error, missing env var).

CI runners treat anything non-zero as failure; the 1 vs 2 distinction lets your post-step hooks differentiate “agent quality regression” from “infra broke” if you care.

  • Use separate projects for staging and production. Mint two API keys, point validate at staging and deploy at production. Eval runs against production will bill against production usage.
  • Pin a judge model. --judge gemini-2.5-flash (or whichever) so the judge’s score-rubric drift is observable, not silent.
  • Path-filter your jobs. PRs that don’t touch tavora/** paying a deploy-validate tax is friction without signal.
  • Capture the run ID on failure. tavora evals runs get <run-id> prints the per-case judge reasoning. Drop the URL into the failed-CI message so reviewers click straight through.