Agents & Auth

How SkillCI drives each coding agent, what auth each needs, and how to add a new adapter.

The adapter contract

Every agent implements AgentAdapter:

interface AgentAdapter {
  readonly kind: AgentKind;            // 'claude-code' | 'cursor' | 'codex'
  isAvailable(): Promise<boolean>;     // can it run in this environment?
  run(args): Promise<AgentRunResult>;  // drive it headlessly, return telemetry
}

run() normalizes whatever the agent CLI emits into a common AgentRunResult (transcript, tokens, cost, steps, tool calls, wall-clock). Adapters degrade gracefully: when unavailable they throw a typed AgentUnavailableError rather than crashing, and the orchestrator falls back to the deterministic mock.

Claude Code


Availability	the `claude` binary on PATH — that's it.
Auth	the CLI owns its own auth: an `ANTHROPIC_API_KEY` or a Claude Code subscription/OAuth session. SkillCI does not require the env key.
Invocation	`claude -p "<prompt>" --output-format json --dangerously-skip-permissions`

Why --dangerously-skip-permissions: headless claude -p is default-deny on tool permissions, so without it the agent can't edit files or run commands and every editing task silently scores as a failure. Runs always happen inside a disposable, isolated sandbox, which is exactly what the flag is for.

The Claude adapter gates on the binary only — mirroring Cursor — because the CLI manages auth itself. The API-key requirement lives solely in the LLM judge's SDK backend (see Scoring).

Cursor


Availability	the `cursor-agent` binary on PATH.
Auth	Cursor manages its own auth (no separate key check).
Invocation	`cursor-agent -p "<prompt>" --output-format json` (best-effort).

Cursor's headless surface is less stable; the adapter parses JSON telemetry when present and otherwise falls back to text with zeroed token/cost fields. A timed-out or non-zero run surfaces a typed error rather than a fake success.

Codex


Availability	the `codex` binary and `OPENAI_API_KEY`.
Auth	key-gated (`OPENAI_API_KEY`).
Invocation	`codex exec "<prompt>"` (non-interactive).

Codex's machine output varies by version; the adapter tries JSON first, falls back to text with zeroed telemetry, and surfaces typed errors on timeout/failure.

The mock adapter

MockAgentAdapter is deterministic and offline — no network, no keys. It powers the test suite and npm run demo, and is the fallback whenever a real adapter is unavailable. This is what makes the whole pipeline runnable in CI for free.

Adding a new adapter

Create src/agents/<name>-adapter.ts implementing AgentAdapter.
isAvailable() — probe the binary (hasBinary) and, only if the CLI can't self-auth, the key (hasEnv). Prefer binary-only when the CLI owns its auth.
run() — spawn via execa inside args.sandbox.workdir, honor task.timeoutMs, and normalize output into AgentRunResult. Don't fabricate a zeroed "success" for a timeout/non-zero run — throw a typed error.
Register it so getAdapter(kind) resolves it.
Tests: mock execa to pin the invocation contract (flags, cwd) and the availability logic. Never spawn real CLIs in unit tests. See claude-invocation.test.ts and real-adapters.test.ts for the pattern.