Skip to content

Commit 47f6032

Browse files
Connoropolouscyrusagentclaudecursoragent
authored
Codex SDK migration + split agent/model selectors (#850)
* chore: checkpoint codex sdk integration WIP * Switch codex runner to SDK and add agent/model selectors * chore: checkpoint codex sdk integration WIP * Switch codex runner to SDK and add agent/model selectors * Resolve rebase conflicts, migrate model defaults, and fix codex session init * Fix Codex/Gemini usage typing and add Codex formatter replay tests * Emit Codex tool lifecycle events for Linear activities * Format Codex todos as markdown checklists * Extract F1 test-drive workflow into shared skill * Add user-facing changelog entry for agent/model selectors * Add PR link to changelog entry for codex selector work * update sandbox settings * add .git folder for worktrees to allowed directories list * Cursor harness MCP enable + permission mapping hardening (#858) * fix(orchestrator): ensure issues are created with 'To Do' status instead of 'Triage' (#815) * fix(orchestrator): ensure issues are created with 'To Do' status instead of 'Triage' Update orchestrator system prompts to explicitly require setting state to "To Do" when creating issues with mcp__linear__create_issue. Previously, issues were being created with default "Triage" status. Changes to both orchestrator.md (v2.5.0) and graphite-orchestrator.md (v1.3.0): - Added state requirement to Required Tools section - Added Status bullet point in Decompose section - Added status checklist item in Sub-Issue Creation Checklist - Fixed outdated tool names in orchestrator.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs(changelog): add entry for CYPACK-761 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Connor Turland <1409121+Connoropolous@users.noreply.github.com> * feat(runners): add codex/cursor harness support and F1 validation * fix(cursor-runner): support non-mock cursor F1 session output * Fix cursor harness verification regressions * fix(ci): resolve build type errors in config-updater and core session types * fix(edge-worker): wire cursor runner selection and prompt assets * docs(changelog): frame cursor harness as major feature * docs(changelog): use user-facing framing for cursor entry * docs(changelog): clarify cursor selector usage * fix(build): remove baseUrl/paths from tsconfig to fix dist output structure The baseUrl pointing to monorepo root caused TypeScript to expand rootDir, nesting compiled output under dist/edge-worker/src/ instead of dist/. This broke module resolution since package.json declares main as dist/index.js. Also adds missing cyrus-cursor-runner dependency to edge-worker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(cursor/edge-worker): stop-session handling and cursor tool activity mapping * chore(gitignore): ignore nested node_modules directories * fix(edge-worker): resume cursor sessions for prompted continuation * fix(cursor-runner): sync project permissions for cursor CLI sessions * fix(cursor-runner): map MCP tool permissions to Cursor CLI config * docs(claude): add cursor permission translation gotcha * fix(codex): default runner model to gpt-5.3-codex * fix(cursor-runner): pre-enable MCP servers before cursor sessions * docs(cursor): clarify .cursor/mcp.json source-of-truth guidance * fix(cursor-runner): scope wildcard file permissions to workspace * fix(codex): post actual error message to Linear for usage limit errors When Codex hits usage limits or turn.failed errors, the full error message is now posted to Linear agent activity instead of a generic message. - AgentSessionManager: use errors[] for content when result is empty - CodexRunner: fallback to standalone error event when turn.failed has no message Co-authored-by: Cursor <cursoragent@cursor.com> * feat(cursor): enable sandbox by default, fix cli.json formatting - Default Cursor sandbox to enabled for tool execution isolation - Write .cursor/cli.json with tabs for Biome compatibility - Exclude .cursor from Biome checks (generated config) - Update CHANGELOG with sandbox default Co-authored-by: Cursor <cursoragent@cursor.com> * feat(cursor): validate cursor-agent version before run, post error to Linear on mismatch - Run cursor-agent --version before spawn and compare to tested version - Post error_during_execution to Linear via agent activity when version mismatches - Add cursorAgentVersion config and CYRUS_CURSOR_AGENT_VERSION env override - Add CursorRunner.version-check.test.ts with 3 tests - Update CHANGELOG with version validation behavior Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cursor): render TODO_STATUS_COMPLETED todos as checked in Linear - Map Cursor API TODO_STATUS_COMPLETED to [x] marker in formatter and summarizeTodoList - Map TODO_STATUS_IN_PROGRESS for (in progress) suffix - Add formatter test for TODO_STATUS_* values - Update CHANGELOG Co-authored-by: Cursor <cursoragent@cursor.com> * feat(cursor): log assembled cursor-agent CLI args to console and session logs - Log full spawn command (path + args) before cursor-agent execution - Write to both console and session log file for debugging - Update CHANGELOG Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cursor): remove --force and --api-key from CLI args - Remove --force option from cursor-agent invocations - Pass API key via CURSOR_API_KEY env only (no longer in args) - API key no longer appears in spawn logs or terminal output - Update CHANGELOG Co-authored-by: Cursor <cursoragent@cursor.com> * add --trust * move trust line * Delete .cursor/cli.json * fix(cursor-runner): backup and restore .cursor/cli.json instead of overwriting Temporarily rename existing .cursor/cli.json before writing Cyrus permissions, then restore original when session ends. Preserves user's config. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cursor): add deny rules to scope Read/Write to workspace only - When broad Read or Write allowed, add deny rules for /etc, /usr, etc. - Add deny rules for workspace sibling directories - Prevents cursor-agent from accessing /etc/hosts and system paths - Add test: only mutates project .cursor/cli.json, leaves home config unchanged - Update wildcard permissions test to expect system-path denies Co-authored-by: Cursor <cursoragent@cursor.com> * docs(changelog): add Cursor .cursor/cli.json backup/restore entry for CYPACK-804 Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Connor Turland <1409121+Connoropolous@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: agentclear <agentops@ceedar.ai> Co-authored-by: Cyrus <208047790+cyrusagent@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 1092ddc commit 47f6032

89 files changed

Lines changed: 6747 additions & 420 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/agents/f1-test-drive.md

Lines changed: 8 additions & 183 deletions
Original file line numberDiff line numberDiff line change
@@ -5,191 +5,16 @@ tools: Bash, Read, Write, Glob, Grep, TodoWrite
55
model: sonnet
66
---
77

8-
# F1 Test Drive Agent
8+
# F1 Test Drive Agent (Wrapper)
99

10-
You are the F1 Test Drive Agent, responsible for orchestrating comprehensive test drives of the Cyrus agent system. Your role is to validate the entire pipeline: Issue-tracker -> EdgeWorker -> Renderer.
10+
Use the shared canonical skill:
1111

12-
## Your Mission
12+
- `skills/f1-test-drive/SKILL.md`
1313

14-
Execute test drives that verify:
15-
1. **Issue-tracker verification**: Issues are created and processed correctly
16-
2. **EdgeWorker verification**: Git worktrees are created, agent sessions start, outputs are available via RPC
17-
3. **Renderer verification**: Outputs are accessible and well-formed
14+
Treat this subagent file as a thin harness-specific wrapper only.
1815

19-
## Test Drive Protocol
16+
Execution requirements:
2017

21-
### Phase 1: Setup
22-
23-
1. **Create test repository** (if needed):
24-
```bash
25-
cd apps/f1
26-
./f1 init-test-repo --path /tmp/f1-test-drive-<timestamp>
27-
```
28-
29-
2. **Start F1 server**:
30-
```bash
31-
CYRUS_PORT=3600 CYRUS_REPO_PATH=/tmp/f1-test-drive-<timestamp> bun run apps/f1/server.ts &
32-
```
33-
34-
3. **Verify server health**:
35-
```bash
36-
CYRUS_PORT=3600 ./f1 ping
37-
CYRUS_PORT=3600 ./f1 status
38-
```
39-
40-
### Phase 2: Issue-Tracker Verification
41-
42-
1. **Create test issue**:
43-
```bash
44-
CYRUS_PORT=3600 ./f1 create-issue \
45-
--title "<issue title>" \
46-
--description "<issue description>"
47-
```
48-
49-
2. **Verify issue created**: Confirm issue ID returned
50-
51-
### Phase 3: EdgeWorker Verification
52-
53-
1. **Start agent session**:
54-
```bash
55-
CYRUS_PORT=3600 ./f1 start-session --issue-id <issue-id>
56-
```
57-
58-
2. **Monitor session activities**:
59-
```bash
60-
CYRUS_PORT=3600 ./f1 view-session --session-id <session-id>
61-
```
62-
63-
3. **Verify**:
64-
- Session started successfully
65-
- Activities are being tracked
66-
- Agent is processing the issue
67-
68-
### Phase 4: Renderer Verification
69-
70-
1. **Check activity output format**:
71-
- Activities have proper types (thought, action)
72-
- Timestamps are present
73-
- Content is well-formed
74-
75-
2. **Test pagination** (if many activities):
76-
```bash
77-
CYRUS_PORT=3600 ./f1 view-session --session-id <session-id> --limit 10 --offset 0
78-
```
79-
80-
### Phase 5: Cleanup
81-
82-
1. **Stop session**:
83-
```bash
84-
CYRUS_PORT=3600 ./f1 stop-session --session-id <session-id>
85-
```
86-
87-
2. **Stop server**: Kill the background server process
88-
89-
## Test Drive Documentation
90-
91-
Create a test drive report in `apps/f1/test-drives/` with this structure:
92-
93-
```markdown
94-
# Test Drive #NNN: [Goal Description]
95-
96-
**Date**: YYYY-MM-DD
97-
**Goal**: [One sentence]
98-
**Test Repo**: [Path to test repository]
99-
100-
---
101-
102-
## Verification Results
103-
104-
### Issue-Tracker Verification
105-
- [ ] Issue created successfully
106-
- [ ] Issue ID returned
107-
- [ ] Issue details accessible
108-
109-
### EdgeWorker Verification
110-
- [ ] Session started successfully
111-
- [ ] Git worktree created (check server logs)
112-
- [ ] Activities being tracked
113-
- [ ] Agent processing issue
114-
115-
### Renderer Verification
116-
- [ ] Activities have proper format
117-
- [ ] Pagination works correctly
118-
- [ ] Search works correctly
119-
120-
---
121-
122-
## Session Log
123-
124-
### [Timestamp] - [Phase]
125-
126-
**Command**: [Exact command]
127-
**Output**: [Key output]
128-
**Status**: [PASS/FAIL]
129-
130-
---
131-
132-
## Final Retrospective
133-
134-
### What Worked Well
135-
[List successes]
136-
137-
### Issues Found
138-
[List problems with severity]
139-
140-
### Recommendations
141-
[Actionable improvements]
142-
143-
### Overall Score
144-
- **Issue-Tracker**: X/10
145-
- **EdgeWorker**: X/10
146-
- **Renderer**: X/10
147-
- **Overall**: X/10
148-
149-
---
150-
151-
**Test Drive Complete**: [Timestamp]
152-
```
153-
154-
## Acceptance Criteria for Test Drives
155-
156-
A test drive PASSES if:
157-
1. Server starts successfully
158-
2. Issue is created and has valid ID
159-
3. Session starts and activities appear
160-
4. Activities are well-formatted with types and timestamps
161-
5. Session can be stopped gracefully
162-
6. No unhandled errors occur
163-
164-
A test drive FAILS if:
165-
- Server won't start
166-
- Issue creation fails
167-
- Session won't start
168-
- No activities appear after 30 seconds
169-
- Malformed activity data
170-
- Unhandled exceptions
171-
172-
## Important Notes
173-
174-
- Always use `CYRUS_PORT=3600` to avoid conflicts
175-
- Create fresh test repos for each test drive
176-
- Document all observations, both positive and negative
177-
- Take screenshots of terminal output when relevant
178-
- Clean up test repos after successful test drives
179-
- If the test drive fails, preserve the state for debugging
180-
181-
## Sample Test Issues
182-
183-
For the rate limiter test repo, use these realistic issues:
184-
185-
1. **Sliding Window Algorithm**:
186-
- Title: "Implement sliding window rate limiter algorithm"
187-
- Description: Implement the SlidingWindowRateLimiter class with configurable window size
188-
189-
2. **Fixed Window Algorithm**:
190-
- Title: "Implement fixed window rate limiter algorithm"
191-
- Description: Add FixedWindowRateLimiter that resets counter at fixed intervals
192-
193-
3. **Unit Tests**:
194-
- Title: "Add comprehensive unit tests for rate limiter"
195-
- Description: Add Vitest tests for TokenBucketRateLimiter covering edge cases
18+
1. Load and follow `skills/f1-test-drive/SKILL.md` as the primary protocol.
19+
2. Keep behavior aligned with the shared skill so other harnesses can reuse the same source.
20+
3. Prefer updating the shared skill over adding logic here.

.claude/skills/f1-test-drive

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../skills/f1-test-drive

.codex/skills/f1-test-drive

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../skills/f1-test-drive

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# Dependency directories
2-
node_modules/
2+
node_modules
3+
**/node_modules
34

45
# Build output
56
dist/

.opencode/skills/f1-test-drive

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../skills/f1-test-drive

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
CLAUDE.md

CHANGELOG.internal.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ This changelog documents internal development changes, refactors, tooling update
44

55
## [Unreleased]
66

7+
8+
### Added
9+
- Added Cursor harness `[agent=cursor]`, including offline F1 drives for stop/tool activity, resume continuation, and permission synchronization behavior. Also added project-level Cursor CLI permissions mapping from Cyrus tool permissions (including subroutine-time updates), pre-run MCP server enablement (`agent mcp list` + `agent mcp enable <server>`), switched the default Codex runner model to `gpt-5.3-codex`, and aligned edge-worker Vitest module resolution to use local `cyrus-claude-runner` sources during tests. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
10+
711
### Fixed
812
- Updated orchestrator system prompts to explicitly require `state: "To Do"` when creating issues via `mcp__linear__create_issue`, preventing issues from being created in "Triage" status. ([CYPACK-761](https://linear.app/ceedar/issue/CYPACK-761), [#815](https://github.com/ceedaragents/cyrus/pull/815))
913

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### Fixed
8+
- **Codex usage limit errors now display full message in Linear** - When Codex hits usage limits or other turn.failed errors, the actual error message is now posted to Linear agent activity instead of a generic message. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
9+
- **Cursor project .cursor/cli.json is now backed up and restored** - CursorRunner no longer overwrites the project's `.cursor/cli.json`. It temporarily renames the existing file before writing Cyrus permissions, then restores the original when the session ends. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
10+
- **Cursor API key no longer in CLI args or logs** - The Cursor API key is now passed only via the `CURSOR_API_KEY` environment variable, so it never appears in spawn logs or terminal output. The `--force` option has also been removed from cursor-agent invocations. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
11+
- **Cursor completed todos now display as checked in Linear** - Cursor API uses `TODO_STATUS_COMPLETED` for completed todo items; the formatter now recognizes this so completed items render as `- [x]` instead of `- [ ]` in Linear activity. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
12+
13+
### Changed
14+
- **Agent and model selectors now work across Claude, Gemini, and Codex** - You can now set runner and model directly in issue descriptions using `[agent=claude|gemini|codex]` and `[model=<model-name>]`. This is not Codex-only: selectors apply to all supported runners. `[agent=...]` explicitly selects the runner, `[model=...]` selects the model and can infer runner family, and description tags take precedence over labels. ([#850](https://github.com/ceedaragents/cyrus/pull/850))
15+
- **Codex tool activity is now visible in Linear sessions** - Codex runs now emit tool lifecycle activity (including command execution, file edits, web fetch/search, MCP tool calls, and todo updates) so activity streams show execution details instead of only final output. ([#850](https://github.com/ceedaragents/cyrus/pull/850))
16+
- **Codex todo output now renders as proper checklists** - Todo items are now formatted as markdown task lists (`- [ ]` and `- [x]`) for correct checklist rendering in Linear. ([#850](https://github.com/ceedaragents/cyrus/pull/850))
17+
- **Major new feature: Cursor agent harness support** - Cyrus now supports Cursor as a first-class agent option. To use it, set `[agent=cursor]` in the issue description or apply a `cursor` issue label; either selector runs end-to-end with the Cursor runner and posts the final response back to the issue thread. Cursor runs now map Cyrus tool permissions into project-level Cursor CLI permissions, pre-enable configured MCP servers before run, and refresh permissions between subroutines so permission changes take effect without restarting the issue flow. Cursor sandbox is enabled by default for tool execution isolation; set `CYRUS_SANDBOX=disabled` to disable. Before each run, Cyrus validates that the installed `cursor-agent` version matches the tested version; a mismatch posts an error to Linear. Set `CYRUS_CURSOR_AGENT_VERSION` to your installed version to override. Assembled cursor-agent CLI args are now logged to console and session log files for debugging. Codex default runner model is now `gpt-5.3-codex` (configurable via `codexDefaultModel`). ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
18+
719
## [0.2.21] - 2026-02-09
820

921
### Changed

CLAUDE.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,122 @@ When examining or working with a package SDK:
9191

9292
3. Review the SDK's documentation, source code, and type definitions to understand its API and usage patterns.
9393

94+
## Shared Skills Across Harnesses
95+
96+
For reusable operational workflows (for example F1 test driving), keep a canonical skill in:
97+
98+
- `skills/<skill-name>/SKILL.md`
99+
100+
Then symlink that skill into harness-specific skill directories:
101+
102+
- `.claude/skills/<skill-name>`
103+
- `.codex/skills/<skill-name>`
104+
- `.opencode/skills/<skill-name>`
105+
106+
Use:
107+
108+
```bash
109+
./scripts/symlink-skills.sh
110+
```
111+
112+
Design rule:
113+
114+
1. Keep subagent files thin wrappers.
115+
2. Put 95%+ workflow logic into canonical shared skills.
116+
3. Update shared skill first; avoid duplicating protocol text across harnesses.
117+
118+
## Checklist For New Agent CLI Harnesses
119+
120+
When implementing a new runner/harness (for example Codex, Gemini, OpenCode, or other CLIs), use this checklist before shipping.
121+
122+
### 1) Session Lifecycle And Turn Limits
123+
124+
- Verify turn-limit behavior (`maxTurns`, `maxSessionTurns`, or equivalent).
125+
- Confirm what error/result payload is emitted when limits are exceeded.
126+
- Ensure session stop behavior is explicit and deterministic.
127+
128+
### 2) Prompt Model And Instructions
129+
130+
- Identify how base system prompt is applied.
131+
- Identify whether appended instructions are supported and whether they extend or replace defaults.
132+
- Confirm provider-specific instruction fields (for example `developer_instructions`) and expected precedence.
133+
134+
### 3) Streaming Event Schema
135+
136+
- Capture real JSON event streams and document item types.
137+
- Determine whether events are full objects or deltas/partials that require aggregation.
138+
- Add replay tests from real transcripts.
139+
140+
### 4) Final Message Semantics
141+
142+
- Verify where the final answer lives:
143+
- in a `result` payload (Claude-style), or
144+
- in the last assistant message (Gemini-style), or
145+
- mixed model/event behavior.
146+
- Ensure we always post a final `response` activity when work completes successfully.
147+
148+
### 5) Tools And Permissions
149+
150+
- Validate `tools`, `allowedTools`, and `disallowedTools` semantics for the SDK.
151+
- Validate approval/sandbox behavior for tool execution.
152+
- Verify tool calls produce both start and completion signals.
153+
- For providers that rely on static/project config files (for example Cursor CLI), implement a permission translation layer from Cyrus/Claude tool names to provider-native permission tokens and write that config before session start. This must support subroutine-time updates when allowed/disallowed tools change. For Cursor MCP servers, pre-enable them before session start (`agent mcp list` + `agent mcp enable <server>` per server) so tools are available in headless runs. When using Cursor in Cyrus, only MCP servers configured in `.cursor/mcp.json` should be treated as project MCP config; use Cursor's MCP config-location and file-format docs as the source of truth: https://cursor.com/docs/context/mcp#configuration-locations. For broad file permissions, map wildcard `Read(**)` / `Write(**)` to workspace-scoped patterns (for example `Read(./**)` / `Write(./**)`) to avoid unintentionally permitting absolute system paths. Reference: https://cursor.com/docs/cli/reference/permissions
154+
155+
### 6) Prompt Streaming Input
156+
157+
- Verify whether the SDK supports streaming/incremental prompt input.
158+
- Set `supportsStreamingInput` correctly and gate behavior in runner adapters.
159+
160+
### 7) MCP Servers And Custom Tools
161+
162+
- Verify MCP server config format and merge behavior.
163+
- Verify custom tool registration/invocation behavior.
164+
- Ensure MCP/custom-tool events are mapped into consistent runner message shapes.
165+
166+
### 8) Runner Selection Via Labels And Description Selectors
167+
168+
- Keep agent label and model label separate (example: `codex` and `gpt-5-codex`).
169+
- Support issue description selectors like `[agent=...]`, `[model=...]`, `[repo=...]`.
170+
- Add precedence tests for labels vs selectors vs repository defaults.
171+
172+
### 9) Activity Formatting And Timeline Visibility
173+
174+
- Ensure formatter output is timeline-ready (AgentActivity content fields).
175+
- Ensure tool lifecycle events are visible as activities (not silently dropped).
176+
- Use Markdown-compatible formatting for checklists:
177+
- `- [ ] item`
178+
- `- [x] item`
179+
180+
### 10) Usage, Stop Reasons, And Typing
181+
182+
- Map usage/cost/stop-reason fields to expected shared types.
183+
- Fill required compatibility fields even when provider omits them natively.
184+
- Keep strict TypeScript compatibility for cross-runner shared contracts.
185+
186+
### 11) Config Schema And Backward Compatibility
187+
188+
- Use provider-specific defaults (`claudeDefaultModel`, `geminiDefaultModel`, `codexDefaultModel`).
189+
- Add config migration logic for renamed or legacy fields.
190+
- Keep docs/comments provider-specific and explicit.
191+
192+
### 12) Validation Protocol Before Merge
193+
194+
- Run unit tests for new runner adapters and formatter behavior.
195+
- Run replay tests from real CLI transcripts.
196+
- Validate F1 end-to-end scenarios for:
197+
- label-based runner/model selection
198+
- description selector-based runner/model selection
199+
- visible tool/file-edit activities in session timeline
200+
- final response posting behavior
201+
202+
### Codex Integration Lesson Learned
203+
204+
Codex emitted tool activity at `item.started`/`item.completed` events, but those were initially not mapped to `tool_use`/`tool_result`. The result was missing action/file-edit visibility in Linear. For any new harness, treat tool lifecycle mapping as a first-class acceptance criterion, not a formatter-only concern.
205+
206+
### Cursor Integration Lesson Learned
207+
208+
Cursor CLI permissions are enforced from config (`~/.cursor/cli-config.json` or `<project>/.cursor/cli.json`) instead of dynamic per-request tool allowlists. For Cursor-like providers, do not rely on dynamic SDK tool constraints alone—add a translation layer (for example `mcp__server__tool` -> `Mcp(server:tool)`, `Bash(...)` -> `Shell(...)`) and sync project permissions before each run and between subroutines. Also pre-enable MCP servers via `agent mcp list` + `agent mcp enable <server>` using both project-listed and runner-configured server names so headless sessions can invoke MCP tools immediately. In Cyrus Cursor runs, treat `.cursor/mcp.json` as the project MCP source and follow Cursor's configuration-location and file-syntax docs (these differ from Claude's MCP interpretation): https://cursor.com/docs/context/mcp#configuration-locations. Use workspace-scoped wildcard file permissions (`Read(./**)`, `Write(./**)`) rather than unscoped `Read(**)` / `Write(**)` in translation defaults. Reference: https://cursor.com/docs/cli/reference/permissions
209+
94210
## Navigating GitHub Repositories
95211

96212
When you need to examine source code from GitHub repositories (especially when GitHub's authentication blocks normal navigation):

0 commit comments

Comments
 (0)