SkillCI Documentation
Welcome! SkillCI is CI/CD for coding-agent configuration — it tests, scores,
and gates changes to your skills, hooks, rules, and CLAUDE.md before you trust
them in Claude Code, Cursor, or Codex.
Start here
| Guide | What it covers |
|---|---|
| Getting Started | Install, run the offline demo, your first live evaluation, project layout. Read this first. |
| Concepts | Baseline vs candidate, the verdict model, the three scoring dimensions, the regression gate. |
| CLI Reference | Every command, flag, and exit code. |
| Writing Tasks | Author task fixtures, objective checks, and judge rubrics. |
| Scoring | The composite formula, thresholds, and how a verdict is decided. |
| Agents & Auth | Claude/Cursor/Codex adapters, auth models, and adding a new adapter. |
| CI Integration | Wire SkillCI into GitHub Actions as a PR gate. |
| Architecture | Module-by-module map, data flow, and the shared contracts. |
| Troubleshooting | Common issues and FAQ. |
The one-paragraph mental model
You point SkillCI at two versions of an agent's config — the baseline
(trusted) and a candidate (proposed change). It runs a suite of sandboxed
tasks twice, once per config, driving a real coding agent headlessly. Each
run is scored three ways (objective checks, LLM judge, cost), the scores are
compared, and a verdict (improved / neutral / regressed) comes out. A
PR is opened only when the candidate is improved with zero hard
regressions. Everything also runs fully offline via a deterministic mock.
Contributing
See CONTRIBUTING.md and the Architecture guide.