Checkpoint Supervision

Available in Rigour v2.14+ | For long-running agent tasks

Checkpoint Supervision enables Rigour to monitor agent quality during extended execution—essential for frontier models like GPT-5.3-Codex that support "coworking mode" with long-running autonomous tasks.

The Problem

Traditional Rigour checks run at task completion. But with frontier models executing tasks over 15+ minutes:

Agent behavior may degrade over time (drift)
Large change sets become harder to review
Early failures waste accumulated work

How It Works

Task Start
    ↓
[15 min] → Checkpoint 1 → Score ≥ 80%? → Continue
    ↓                         ↓
[30 min] → Checkpoint 2 → Score < 80%? → Alert + Auto-Save
    ↓
Task Complete → Final Verification

Configuration

# rigour.yml
gates:
  checkpoint:
    enabled: true
    interval_minutes: 15
    quality_threshold: 80
    drift_detection: true
    auto_save_on_failure: true

Option	Default	Description
`enabled`	`false`	Enable checkpoint supervision
`interval_minutes`	`15`	Time between checkpoints
`quality_threshold`	`80`	Min quality score to continue
`drift_detection`	`true`	Monitor for behavior regression
`auto_save_on_failure`	`true`	Save work before aborting

MCP Integration

Agents report checkpoints via MCP:

await mcp.call("rigour_checkpoint", {
  cwd: "/project",
  progress_pct: 50,
  files_changed: ["src/api/users.ts", "src/api/orders.ts"],
  summary: "Implemented user and order API endpoints"
});

// Response
{
  continue: true,
  quality_score: 85,
  warnings: ["src/api/users.ts exceeds 300 lines"]
}

Drift Detection

Rigour tracks quality scores over time to detect regression:

Checkpoint 1: 92% ✓
Checkpoint 2: 88% ✓
Checkpoint 3: 72% ⚠️ DRIFT DETECTED

When drift is detected:

Agent receives immediate feedback
Work is auto-saved
Studio shows drift timeline

Studio Integration

The Checkpoints tab in Rigour Studio visualizes:

Timeline of checkpoint events
Quality score trends
Drift detection alerts
Auto-save recovery points

Best Practices

Set realistic intervals: 15 min default works for most tasks
Tune quality threshold: Lower for exploratory work, higher for production code
Review drift patterns: Use Studio to identify when/why quality degrades

The Problem​

How It Works​

Configuration​

MCP Integration​

Drift Detection​

Studio Integration​

Best Practices​

See Also​