Built my own coding agent harness called pi. Think Claude Code/Codex. Ran it through terminal-bench 2.0. Screenshot 2 has the full system prompt. That's it. It also only exposes 4 tools to the model: file read/write/edit and bash. Each tool description is no longer than 1-2 sentences. It has no web search, no compaction, no auto-retries.

It hilariously placed #7 on the leaderboard using Anthropic's Opus 4.5 model, beating Anthropic's own Claude Code harness, and most Codex variations.