Everything we have published. The whitepaper that describes the framework, the research paper that tests it, and the benchmarking kit so anyone can reproduce or challenge the results.
Describes the problem (agents act, nothing governs them), explains the five-function architecture (identify, authorize, enforce, sanction, record), walks through a concrete example, and is honest about limitations.
Written for operators, investors, and policymakers. No jargon, no code.
Everything you need to run the same benchmarks, test different rules, or validate on new benchmark suites. No access to the private framework repo required.
Genesis (read first), Plan First, Iterate Don’t Repeat, Verify Before Done, Time Is Limited, When Stuck (change approach). Testing showed rules 1, 3, and 4 drove most of the lift.
Read the rules →10 tasks: $15–40, 1–3 hours. Full 89 tasks: $150–400, 12–20 hours. All 4 agents on 89 tasks: $600–1600, 3–5 days. Token consumption averages 50–100K per task.
| Experiment | What | Cost | Impact |
|---|---|---|---|
| P1 | Run vanilla Opus 4.7 on all 89 tasks (resolves model version confound) | ~$200 | Resolves the paper’s biggest credibility threat |
| P3 | Run ad-hoc baseline on all 89 tasks (currently only 10) | ~$200 | Powers up rule-quality comparison to statistical significance |
| P6 | Run governance rules on GPT-5 via Codex CLI | ~$50 | Tests whether governance transfers across model families |