Agents & Auth
How SkillCI drives each coding agent, what auth each needs, and how to add a new adapter.
The adapter contract
Every agent implements AgentAdapter:
interface AgentAdapter {
readonly kind: AgentKind; // 'claude-code' | 'cursor' | 'codex'
isAvailable(): Promise<boolean>; // can it run in this environment?
run(args): Promise<AgentRunResult>; // drive it headlessly, return telemetry
}
run() normalizes whatever the agent CLI emits into a common AgentRunResult
(transcript, tokens, cost, steps, tool calls, wall-clock). Adapters degrade
gracefully: when unavailable they throw a typed AgentUnavailableError rather
than crashing, and the orchestrator falls back to the deterministic mock.
Claude Code
| Availability | the claude binary on PATH — that's it. |
| Auth | the CLI owns its own auth: an ANTHROPIC_API_KEY or a Claude Code subscription/OAuth session. SkillCI does not require the env key. |
| Invocation | claude -p "<prompt>" --output-format json --dangerously-skip-permissions |
Why --dangerously-skip-permissions: headless claude -p is default-deny on
tool permissions, so without it the agent can't edit files or run commands and
every editing task silently scores as a failure. Runs always happen inside a
disposable, isolated sandbox, which is exactly what the flag is for.
The Claude adapter gates on the binary only — mirroring Cursor — because the CLI manages auth itself. The API-key requirement lives solely in the LLM judge's SDK backend (see Scoring).
Cursor
| Availability | the cursor-agent binary on PATH. |
| Auth | Cursor manages its own auth (no separate key check). |
| Invocation | cursor-agent -p "<prompt>" --output-format json (best-effort). |
Cursor's headless surface is less stable; the adapter parses JSON telemetry when present and otherwise falls back to text with zeroed token/cost fields. A timed-out or non-zero run surfaces a typed error rather than a fake success.
Codex
| Availability | the codex binary and OPENAI_API_KEY. |
| Auth | key-gated (OPENAI_API_KEY). |
| Invocation | codex exec "<prompt>" (non-interactive). |
Codex's machine output varies by version; the adapter tries JSON first, falls back to text with zeroed telemetry, and surfaces typed errors on timeout/failure.
The mock adapter
MockAgentAdapter is deterministic and offline — no network, no keys. It powers
the test suite and npm run demo, and is the fallback whenever a real adapter is
unavailable. This is what makes the whole pipeline runnable in CI for free.
Adding a new adapter
- Create
src/agents/<name>-adapter.tsimplementingAgentAdapter. isAvailable()— probe the binary (hasBinary) and, only if the CLI can't self-auth, the key (hasEnv). Prefer binary-only when the CLI owns its auth.run()— spawn viaexecainsideargs.sandbox.workdir, honortask.timeoutMs, and normalize output intoAgentRunResult. Don't fabricate a zeroed "success" for a timeout/non-zero run — throw a typed error.- Register it so
getAdapter(kind)resolves it. - Tests: mock
execato pin the invocation contract (flags, cwd) and the availability logic. Never spawn real CLIs in unit tests. Seeclaude-invocation.test.tsandreal-adapters.test.tsfor the pattern.