Open research from the Covenant Foundation

Covenant
Research

AI agents are in production. The infrastructure to govern them is not. We are building it in the open and publishing everything we learn.

What’s here

Two things,
freely available.

Preliminary result
67.4%

on Terminal-Bench 2.0. 89 tasks, single attempt, no retry. Early results suggest governance rules improve coding agent performance vs the vanilla baseline at 58.0%.

Caveat: the governed run used Opus 4.7; vanilla baseline is Opus 4.6. Model confound not yet resolved. See the whitepaper for full methodology.

Governed (6 rules, Opus 4.7) 67.4%
Vanilla Claude Code (Opus 4.6) 58.0%
Ad-hoc rules (Opus 4.7, 10 tasks) 42.0%
Defensible lift (vs vanilla) +9.4 pts*

Built in the open.

The framework, the research, and the benchmarks are all public. If you operate agents and governance keeps you up at night, this work is for you.