CI integration

The tavora CLI has two flavours of CI gate:

Gate	What it checks	When to run
`tavora deploy --dry-run`	The local `tavora/` folder validates against the platform (schema parse, model + index + secret refs resolve, agent-count tier limits).	Every PR that touches `tavora/**`.
`tavora rag-eval formats --gate`	The RAG pipeline still accepts and indexes every supported document format.	After chunker / extractor / migration changes.
`tavora rag-eval judge --gate`	LLM-judges RAG answers against a fixture invoice corpus.	On a schedule + after retrieval-side changes.

Per-agent eval suites are advisory — they report a pass/fail score but never block deploy. The Phase 12 promotion gate was removed in the 2026-05-11 slim-down. Run them as a quality signal either pre-deploy (tavora deploy --run-evals) or on demand (tavora evals run <agent>); the gating job is the validate step, not the eval step.

Validate-and-deploy: GitHub Actions

A two-job workflow: validate on every PR, deploy on merge to main.

name: agent-deploy

on:
  pull_request:
    paths:
      - 'tavora/**'
  push:
    branches: [main]
    paths:
      - 'tavora/**'

jobs:
  validate:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    env:
      TAVORA_URL: https://api.tavora.ai
      TAVORA_API_KEY: ${{ secrets.TAVORA_STAGING_API_KEY }}
    steps:
      - uses: actions/checkout@v4
      - name: Install tavora CLI
        run: |
          curl -fsSL https://github.com/tavora-ai/tavora-cli/releases/latest/download/tavora-linux-amd64.tar.gz \
            | tar -xz -C /usr/local/bin tavora
      - name: Validate
        run: tavora deploy --dry-run

  deploy:
    if: github.event_name == 'push'
    runs-on: ubuntu-latest
    env:
      TAVORA_URL: https://api.tavora.ai
      TAVORA_API_KEY: ${{ secrets.TAVORA_PRODUCTION_API_KEY }}
    steps:
      - uses: actions/checkout@v4
      - name: Install tavora CLI
        run: |
          curl -fsSL https://github.com/tavora-ai/tavora-cli/releases/latest/download/tavora-linux-amd64.tar.gz \
            | tar -xz -C /usr/local/bin tavora
      - name: Deploy
        run: tavora deploy --run-evals

--dry-run validates locally + against the server, but persists nothing — perfect for PR feedback. --run-evals on the deploy job fires the per-agent pinned suite after publishing; eval failures surface in the CI log but don’t roll back the deploy.

GitLab CI

stages: [validate, deploy]

validate:
  stage: validate
  image: golang:1.22
  variables:
    TAVORA_URL: https://api.tavora.ai
    TAVORA_API_KEY: $TAVORA_STAGING_API_KEY
  script:
    - go install github.com/tavora-ai/tavora-cli/cmd/tavora@latest
    - tavora deploy --dry-run
  only:
    refs: [merge_requests]
    changes: ['tavora/**/*']

deploy:
  stage: deploy
  image: golang:1.22
  variables:
    TAVORA_URL: https://api.tavora.ai
    TAVORA_API_KEY: $TAVORA_PRODUCTION_API_KEY
  script:
    - go install github.com/tavora-ai/tavora-cli/cmd/tavora@latest
    - tavora deploy --run-evals
  only:
    refs: [main]
    changes: ['tavora/**/*']

RAG smoke tests

The two rag-eval gates ship pre-built and exercise the runtime end-to-end. Useful on a nightly schedule even when you haven’t changed anything — they catch silent regressions in the underlying provider’s behaviour.

on:
  schedule:
    - cron: '0 6 * * *'

jobs:
  rag:
    runs-on: ubuntu-latest
    env:
      TAVORA_URL: https://api.tavora.ai
      TAVORA_API_KEY: ${{ secrets.TAVORA_API_KEY }}
      GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
    steps:
      - run: |
          curl -fsSL https://github.com/tavora-ai/tavora-cli/releases/latest/download/tavora-linux-amd64.tar.gz \
            | tar -xz -C /usr/local/bin tavora
      - run: tavora rag-eval formats --gate
      - run: tavora rag-eval judge --gate --pass-threshold 7

rag-eval judge provisions a temp store, uploads fixture invoices, asks structured questions, scores via Gemini, then (by default) cleans up. --cleanup=false preserves the store for forensic inspection later.

Exit codes

Code	Meaning
`0`	Validation passed / deploy succeeded / all eval cases passed the threshold.
`1`	Validation failed (schema error, broken ref, fatal `--dry-run` issue) or at least one eval case failed.
`2`	Run did not complete (timeout, server error, missing env var).

CI runners treat anything non-zero as failure; the 1 vs 2 distinction lets your post-step hooks differentiate “agent quality regression” from “infra broke” if you care.

Tips

Use separate projects for staging and production. Mint two API keys, point validate at staging and deploy at production. Eval runs against production will bill against production usage.
Pin a judge model. --judge gemini-2.5-flash (or whichever) so the judge’s score-rubric drift is observable, not silent.
Path-filter your jobs. PRs that don’t touch tavora/** paying a deploy-validate tax is friction without signal.
Capture the run ID on failure. tavora evals runs get <run-id> prints the per-case judge reasoning. Drop the URL into the failed-CI message so reviewers click straight through.

Next steps

CLI reference — the full subcommand catalogue.
Tutorial: Build the tasklist agent — the end-to-end code-first flow this CI integrates with.