Mission - Agent Workbench: Why Capable Models Still Fail

Goal

Run the same small repo task twice, once prompt-only and once with the seven workbench surfaces wired in, and emit a failure-mode report that maps each missed surface to the symptom it caused.

Inputs

Deliverables

Acceptance

Out of scope

References

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mission - Agent Workbench: Why Capable Models Still Fail

Goal

Inputs

Deliverables

Acceptance

Out of scope

References

FilesExpand file tree

mission.md

Latest commit

History

mission.md

File metadata and controls

Mission - Agent Workbench: Why Capable Models Still Fail

Goal

Inputs

Deliverables

Acceptance

Out of scope

References