agent-spec

Humans write the contract. Agents implement. Machines verify.
An AI-native BDD/spec tool that shifts code review from reading diffs to defining intent.

Rust BDD / Spec CLI-first 中文 + English Git & jj aware Agent-agnostic

Review Point Displacement

Traditional code review asks humans to judge 500 lines of code diff. agent-spec moves the review point: humans define 50 lines of contract, and machines verify the code against it.

❌ Traditional Flow

Issue → Branch → Code → PR → Read Diff (80%) → Approve

10%
80%
10%
Write Issue → Read Code Diff → Approve

✓ agent-spec Flow

Contract (60%) → Agent Codes → Explain → Approve

60%
30%
10%
Write Contract → Read Explain → Approve

Human time shifts from "reading code" to "defining intent" — a higher-value activity. Quality assurance shifts from "human judgment" to "machine verification".

Skills for Every AI Agent

agent-spec ships project-local Skills that teach AI agents the contract-driven workflow. One install command — works with Claude Code, Codex, Cursor, Aider, and any agent that reads workspace conventions.

$ npx skills add ZhangHanDong/agent-spec click to copy
Installs agent-spec skills into your project's agent configuration files

agent-spec-tool-first

workflow

The default integration path. Teaches the Agent the seven-step workflow: read the Contract, implement within Boundaries, run lifecycle for verification, retry on failure, generate explain for review. CLI commands are the primary interface.

agent-spec-authoring

authoring

The spec writing path. Teaches the Agent how to draft and revise Task Contracts in the DSL — four elements structure, bilingual keywords, test selectors, step tables, and the "exception paths ≥ happy paths" principle.

Claude Code Codex CLI Cursor Aider AGENTS.md .cursorrules

Four Elements of a Contract

A Contract is not a vague Issue. It's a precise specification with four parts that constrain the Agent's behavior and define deterministic acceptance criteria.

## Intent — What and Why

A focused statement of purpose. Not a feature list — a clear direction that gives the Agent context.

## Intent

Add a user registration endpoint to the existing auth module.
New users register with email + password; a verification email is sent
on success. This is the first step of the user system — login and
password reset will be built on top of it later.
## 意图

为现有的认证模块添加用户注册 endpoint。新用户通过邮箱+密码注册,
注册成功后发送验证邮件。这是用户体系的第一步,后续会在此基础上
添加登录和密码重置。
## 意図

既存の認証モジュールにユーザー登録エンドポイントを追加する。
新規ユーザーはメールアドレスとパスワードで登録し、成功時に確認
メールを送信する。これはユーザーシステムの第一歩であり、今後
ログインとパスワードリセットをこの基盤の上に構築する。

## Decisions — Fixed Technical Choices

Already-decided choices that remove the Agent's decision space. The Agent follows these without questioning.

## Decisions

- Route: POST /api/v1/auth/register
- Password hash: bcrypt, cost factor = 12
- Verification token: crypto.randomUUID(), stored in DB, 24h expiry
- Email: use existing EmailService, do not create a new one
## 已定决策

- 路由: POST /api/v1/auth/register
- 密码哈希: bcrypt, cost factor = 12
- 验证 Token: crypto.randomUUID(), 存数据库, 24h 过期
- 邮件: 使用现有 EmailService,不新建
## 決定事項

- ルーティング: POST /api/v1/auth/register
- パスワードハッシュ: bcrypt, コストファクター = 12
- 検証トークン: crypto.randomUUID(), DB保存, 24時間有効
- メール: 既存のEmailServiceを使用、新規作成しない

## Boundaries — What to Touch, What Not to Touch

Path globs are mechanically enforced by the BoundariesVerifier. Natural language prohibitions are checked by lint.

## Boundaries

### Allowed Changes
- crates/api/src/auth/**
- crates/api/tests/auth/**

### Forbidden
- Do not add new dependencies
- Do not modify the existing login endpoint
## 边界

### 允许修改
- crates/api/src/auth/**
- crates/api/tests/auth/**

### 禁止做
- 不要添加新的依赖
- 不要修改现有的登录 endpoint
## 境界

### 変更許可
- crates/api/src/auth/**
- crates/api/tests/auth/**

### 禁止事項
- 新しい依存関係を追加しない
- 既存のログインエンドポイントを変更しない

## Completion Criteria — Deterministic Pass/Fail

BDD scenarios with explicit test bindings. Key rule: exception paths ≥ happy paths.

## Completion Criteria

Scenario: Successful registration
  Test: test_register_returns_201
  Given no user with email "alice@example.com" exists
  When client submits the registration request
  Then response status should be 201

Scenario: Duplicate email rejected ← exception path
  Test: test_register_rejects_duplicate
  Given a user with email "alice@example.com" already exists
  When client submits the same email for registration
  Then response status should be 409
## 完成条件

场景: 注册成功
  测试: test_register_returns_201
  假设 不存在邮箱为 "alice@example.com" 的用户
   客户端提交注册请求
  那么 响应状态码为 201

场景: 重复邮箱被拒绝 ← 异常路径
  测试: test_register_rejects_duplicate
  假设 已存在邮箱为 "alice@example.com" 的用户
   客户端提交相同邮箱的注册请求
  那么 响应状态码为 409
## 完了条件

シナリオ: 登録成功
  テスト: test_register_returns_201
  前提 メール "alice@example.com" のユーザーが存在しない
  もし クライアントが登録リクエストを送信する
  ならば レスポンスステータスは 201 である

シナリオ: 重複メール拒否 ← 例外パス
  テスト: test_register_rejects_duplicate
  前提 メール "alice@example.com" のユーザーが既に存在する
  もし クライアントが同じメールで登録リクエストを送信する
  ならば レスポンスステータスは 409 である

Seven Steps, Three Actors

Human writes intent. Agent implements code. Machine verifies correctness. Each step has a clear owner and a specific agent-spec command.

STEP 01
Write Contract HUMAN
Define Intent, Decisions, Boundaries, and Completion Criteria. Exception scenarios ≥ happy path scenarios.
agent-spec init --level task --name "User Registration"
STEP 02
Quality Gate MACHINE
Check Contract quality before handing to Agent. Catches vague verbs, unquantified constraints, sycophancy bias.
agent-spec lint specs/task.spec --min-score 0.7
STEP 03
Agent Implements AGENT
Agent reads the Contract and codes within its constraints. Decisions are fixed, boundaries are enforced, criteria are the stop condition.
agent-spec contract specs/task.spec
STEP 04
Lifecycle Verification MACHINE
Four-layer verification pipeline: lint → structural → boundaries → tests. Agent retries on failure — no human needed.
agent-spec lifecycle specs/task.spec --code . --format json
STEP 05
Guard Gate MACHINE
Pre-commit or CI check. All specs in the repo are verified against the current change set.
agent-spec guard --spec-dir specs --code .
STEP 06
Contract Acceptance HUMAN
Reviewer reads a Contract-level summary — not a code diff. Two questions: Is the Contract correct? Did all verifications pass?
agent-spec explain specs/task.spec --format markdown
STEP 07
Stamp & Archive MACHINE
Record Contract-to-Commit traceability via Git trailers. Every commit traces back to intent.
agent-spec stamp specs/task.spec --dry-run

Four-Layer Verification Pyramid

Deterministic layers run first — zero token cost, no false negatives. AI layers handle the residual — probabilistic, with structured evidence.

L4 · AI Verifier probabilistic · ~$0.01-0.05 · uncertain verdict
L3 · Test Verifier deterministic · 0 tokens · runs bound tests
L2 · Boundaries Verifier deterministic · 0 tokens · path glob matching
L1 · Structural Verifier deterministic · 0 tokens · pattern matching on Must-Not
← cheaper, faster, deterministic richer, costly, probabilistic →
pass
Verified by a deterministic or AI verifier
fail
Verification found a concrete violation
⏭️
skip
No verifier covered this scenario
uncertain
AI reviewed but needs human judgment

Key rule: skip ≠ pass  —  all four verdicts are semantically distinct.

Two Modes of AI Verification

Mechanical verifiers handle deterministic checks. For the rest, agent-spec supports two modes: the calling Agent does the review, or an injected backend does it.

Caller Mode

--ai-mode caller

The calling Agent (Claude Code, Codex…) performs AI verification itself. agent-spec emits structured requests; the Agent returns structured decisions.

Agent lifecycle
agent-spec runs L1–L3 mechanical
agent-spec emits AiRequest[]
Agent analyzes code, returns AiDecision[]
agent-spec merges into final report
Human reviews uncertain findings

Backend Mode

Rust API: AiBackend trait

An independent AI backend is injected via the Rust API. Ideal for orchestrator systems (Symphony-like) using a different model for verification.

Orchestrator injects AiBackend
agent-spec runs L1–L3 mechanical
agent-spec calls backend.analyze()
AI Backend returns AiDecision
agent-spec complete report, no human loop

Both modes share the same data structures: AiRequest and AiDecision. agent-spec stays provider-agnostic.

Three-Layer Spec Hierarchy

Constraints and decisions inherit downward. Organization rules flow through project conventions into every task contract. Write once, enforce everywhere.

L0 · org.spec Security policies, coding standards, forbidden patterns — organization-wide
↓ inherits
L1 · project.spec Tech stack decisions, API conventions, test requirements — project-wide
↓ inherits
L2 · task.spec Intent, boundaries, BDD completion criteria — one per task

From Zero to Verified in 5 Commands

# Install
cargo install agent-spec

# Create a task contract
agent-spec init --level task --name "User Registration"

# Check contract quality
agent-spec lint specs/user-registration.spec --min-score 0.7

# Verify code against contract
agent-spec lifecycle specs/user-registration.spec --code . --format json

# Generate review summary for PR
agent-spec explain specs/user-registration.spec --format markdown