How We Broke Top AI Agent Benchmarks: And What Comes Next