Code Agent Proof Sample
Example format only, using a synthetic public-safe repo scenario. The deliverable is not "the agent says it worked"; it is the trace of context, proposed change, review gate, evidence, and what is still unsafe to claim.
Scenario
Operator Trace
Evidence Ledger
| Check | Concrete Evidence | Status |
|---|---|---|
| Scope fit | Task is one repo question, change-plan, review pass, or small safe edit. Anything broader becomes a separate sprint. | Pass |
| Context retrieval | Relevant files and scripts are named in the handoff. Unknowns are listed instead of papered over. | Pass |
| Implementation | Diff is limited to the named task. No unrelated cleanup, config churn, or private data movement. | Human review |
| Verification | Closest real test, build, lint, typecheck, browser smoke, or screenshot is run. If tooling fails, the exact failure is captured. | Pass or named blocker |
| Claim boundary | No benchmark, security, compliance, production-readiness, revenue, partnership, or endorsement claim without direct proof. | Stop if unproven |
Sample Handoff
Result: The agent completed the narrow code task on a public-safe branch and produced a reviewable diff. Verification covered the relevant local test and browser smoke check. No production deploy, customer data, benchmark claim, or public post happened.
Reviewer packet: changed files, command output, screenshot or rendered URL, unresolved risks, and the exact next action.
Remaining caveat: this proves workflow discipline and reviewability for the sampled task. It does not prove general autonomy, production safety, benchmark superiority, or buyer ROI.
Next action: buyer reviews the packet in writing and either accepts, rejects, or narrows the next run.
Want this kind of proof packet built for your repo? Start with a written mini-audit or support the sample if it saved you time.