cyrusagents
diff --git a/‎.claude/agents/f1-test-drive.md‎
Lines changed: 8 additions & 183 deletions b/‎.claude/agents/f1-test-drive.md‎
Lines changed: 8 additions & 183 deletions
diff --git a/‎.claude/skills/f1-test-drive‎
Lines changed: 1 addition & 0 deletions b/‎.claude/skills/f1-test-drive‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.codex/skills/f1-test-drive‎
Lines changed: 1 addition & 0 deletions b/‎.codex/skills/f1-test-drive‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 1 deletion b/‎.gitignore‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎.opencode/skills/f1-test-drive‎
Lines changed: 1 addition & 0 deletions b/‎.opencode/skills/f1-test-drive‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 1 addition & 0 deletions b/‎AGENTS.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.internal.md‎
Lines changed: 4 additions & 0 deletions b/‎CHANGELOG.internal.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 12 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 116 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 116 additions & 0 deletions
@@ -5,191 +5,16 @@ tools: Bash, Read, Write, Glob, Grep, TodoWrite
 model: sonnet
 ---
 
-# F1 Test Drive Agent
+# F1 Test Drive Agent (Wrapper)
 
-You are the F1 Test Drive Agent, responsible for orchestrating comprehensive test drives of the Cyrus agent system. Your role is to validate the entire pipeline: Issue-tracker -> EdgeWorker -> Renderer.
+Use the shared canonical skill:
 
-## Your Mission
+- `skills/f1-test-drive/SKILL.md`
 
-Execute test drives that verify:
-1. **Issue-tracker verification**: Issues are created and processed correctly
-2. **EdgeWorker verification**: Git worktrees are created, agent sessions start, outputs are available via RPC
-3. **Renderer verification**: Outputs are accessible and well-formed
+Treat this subagent file as a thin harness-specific wrapper only.
 
-## Test Drive Protocol
+Execution requirements:
 
-### Phase 1: Setup
-
-1. **Create test repository** (if needed):
-   ```bash
-   cd apps/f1
-   ./f1 init-test-repo --path /tmp/f1-test-drive-<timestamp>
-   ```
-
-2. **Start F1 server**:
-   ```bash
-   CYRUS_PORT=3600 CYRUS_REPO_PATH=/tmp/f1-test-drive-<timestamp> bun run apps/f1/server.ts &
-   ```
-
-3. **Verify server health**:
-   ```bash
-   CYRUS_PORT=3600 ./f1 ping
-   CYRUS_PORT=3600 ./f1 status
-   ```
-
-### Phase 2: Issue-Tracker Verification
-
-1. **Create test issue**:
-   ```bash
-   CYRUS_PORT=3600 ./f1 create-issue \
-     --title "<issue title>" \
-     --description "<issue description>"
-   ```
-
-2. **Verify issue created**: Confirm issue ID returned
-
-### Phase 3: EdgeWorker Verification
-
-1. **Start agent session**:
-   ```bash
-   CYRUS_PORT=3600 ./f1 start-session --issue-id <issue-id>
-   ```
-
-2. **Monitor session activities**:
-   ```bash
-   CYRUS_PORT=3600 ./f1 view-session --session-id <session-id>
-   ```
-
-3. **Verify**:
-   - Session started successfully
-   - Activities are being tracked
-   - Agent is processing the issue
-
-### Phase 4: Renderer Verification
-
-1. **Check activity output format**:
-   - Activities have proper types (thought, action)
-   - Timestamps are present
-   - Content is well-formed
-
-2. **Test pagination** (if many activities):
-   ```bash
-   CYRUS_PORT=3600 ./f1 view-session --session-id <session-id> --limit 10 --offset 0
-   ```
-
-### Phase 5: Cleanup
-
-1. **Stop session**:
-   ```bash
-   CYRUS_PORT=3600 ./f1 stop-session --session-id <session-id>
-   ```
-
-2. **Stop server**: Kill the background server process
-
-## Test Drive Documentation
-
-Create a test drive report in `apps/f1/test-drives/` with this structure:
-
-```markdown
-# Test Drive #NNN: [Goal Description]
-
-**Date**: YYYY-MM-DD
-**Goal**: [One sentence]
-**Test Repo**: [Path to test repository]
-
----
-
-## Verification Results
-
-### Issue-Tracker Verification
-- [ ] Issue created successfully
-- [ ] Issue ID returned
-- [ ] Issue details accessible
-
-### EdgeWorker Verification
-- [ ] Session started successfully
-- [ ] Git worktree created (check server logs)
-- [ ] Activities being tracked
-- [ ] Agent processing issue
-
-### Renderer Verification
-- [ ] Activities have proper format
-- [ ] Pagination works correctly
-- [ ] Search works correctly
-
----
-
-## Session Log
-
-### [Timestamp] - [Phase]
-
-**Command**: [Exact command]
-**Output**: [Key output]
-**Status**: [PASS/FAIL]
-
----
-
-## Final Retrospective
-
-### What Worked Well
-[List successes]
-
-### Issues Found
-[List problems with severity]
-
-### Recommendations
-[Actionable improvements]
-
-### Overall Score
-- **Issue-Tracker**: X/10
-- **EdgeWorker**: X/10
-- **Renderer**: X/10
-- **Overall**: X/10
-
----
-
-**Test Drive Complete**: [Timestamp]
-```
-
-## Acceptance Criteria for Test Drives
-
-A test drive PASSES if:
-1. Server starts successfully
-2. Issue is created and has valid ID
-3. Session starts and activities appear
-4. Activities are well-formatted with types and timestamps
-5. Session can be stopped gracefully
-6. No unhandled errors occur
-
-A test drive FAILS if:
-- Server won't start
-- Issue creation fails
-- Session won't start
-- No activities appear after 30 seconds
-- Malformed activity data
-- Unhandled exceptions
-
-## Important Notes
-
-- Always use `CYRUS_PORT=3600` to avoid conflicts
-- Create fresh test repos for each test drive
-- Document all observations, both positive and negative
-- Take screenshots of terminal output when relevant
-- Clean up test repos after successful test drives
-- If the test drive fails, preserve the state for debugging
-
-## Sample Test Issues
-
-For the rate limiter test repo, use these realistic issues:
-
-1. **Sliding Window Algorithm**:
-   - Title: "Implement sliding window rate limiter algorithm"
-   - Description: Implement the SlidingWindowRateLimiter class with configurable window size
-
-2. **Fixed Window Algorithm**:
-   - Title: "Implement fixed window rate limiter algorithm"
-   - Description: Add FixedWindowRateLimiter that resets counter at fixed intervals
-
-3. **Unit Tests**:
-   - Title: "Add comprehensive unit tests for rate limiter"
-   - Description: Add Vitest tests for TokenBucketRateLimiter covering edge cases
+1. Load and follow `skills/f1-test-drive/SKILL.md` as the primary protocol.
+2. Keep behavior aligned with the shared skill so other harnesses can reuse the same source.
+3. Prefer updating the shared skill over adding logic here.
@@ -0,0 +1 @@
+../../skills/f1-test-drive
@@ -0,0 +1 @@
+../../skills/f1-test-drive
@@ -1,5 +1,6 @@
 # Dependency directories
-node_modules/
+node_modules
+**/node_modules
 
 # Build output
 dist/
 
@@ -0,0 +1 @@
+../../skills/f1-test-drive
@@ -0,0 +1 @@
+CLAUDE.md
@@ -4,6 +4,10 @@ This changelog documents internal development changes, refactors, tooling update
 
 ## [Unreleased]
 
+
+### Added
+- Added Cursor harness `[agent=cursor]`, including offline F1 drives for stop/tool activity, resume continuation, and permission synchronization behavior. Also added project-level Cursor CLI permissions mapping from Cyrus tool permissions (including subroutine-time updates), pre-run MCP server enablement (`agent mcp list` + `agent mcp enable <server>`), switched the default Codex runner model to `gpt-5.3-codex`, and aligned edge-worker Vitest module resolution to use local `cyrus-claude-runner` sources during tests. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
+
 ### Fixed
 - Updated orchestrator system prompts to explicitly require `state: "To Do"` when creating issues via `mcp__linear__create_issue`, preventing issues from being created in "Triage" status. ([CYPACK-761](https://linear.app/ceedar/issue/CYPACK-761), [#815](https://github.com/ceedaragents/cyrus/pull/815))
 
 
@@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file.
 
 ## [Unreleased]
 
+### Fixed
+- **Codex usage limit errors now display full message in Linear** - When Codex hits usage limits or other turn.failed errors, the actual error message is now posted to Linear agent activity instead of a generic message. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
+- **Cursor project .cursor/cli.json is now backed up and restored** - CursorRunner no longer overwrites the project's `.cursor/cli.json`. It temporarily renames the existing file before writing Cyrus permissions, then restores the original when the session ends. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
+- **Cursor API key no longer in CLI args or logs** - The Cursor API key is now passed only via the `CURSOR_API_KEY` environment variable, so it never appears in spawn logs or terminal output. The `--force` option has also been removed from cursor-agent invocations. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
+- **Cursor completed todos now display as checked in Linear** - Cursor API uses `TODO_STATUS_COMPLETED` for completed todo items; the formatter now recognizes this so completed items render as `- [x]` instead of `- [ ]` in Linear activity. ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
+
+### Changed
+- **Agent and model selectors now work across Claude, Gemini, and Codex** - You can now set runner and model directly in issue descriptions using `[agent=claude|gemini|codex]` and `[model=<model-name>]`. This is not Codex-only: selectors apply to all supported runners. `[agent=...]` explicitly selects the runner, `[model=...]` selects the model and can infer runner family, and description tags take precedence over labels. ([#850](https://github.com/ceedaragents/cyrus/pull/850))
+- **Codex tool activity is now visible in Linear sessions** - Codex runs now emit tool lifecycle activity (including command execution, file edits, web fetch/search, MCP tool calls, and todo updates) so activity streams show execution details instead of only final output. ([#850](https://github.com/ceedaragents/cyrus/pull/850))
+- **Codex todo output now renders as proper checklists** - Todo items are now formatted as markdown task lists (`- [ ]` and `- [x]`) for correct checklist rendering in Linear. ([#850](https://github.com/ceedaragents/cyrus/pull/850))
+- **Major new feature: Cursor agent harness support** - Cyrus now supports Cursor as a first-class agent option. To use it, set `[agent=cursor]` in the issue description or apply a `cursor` issue label; either selector runs end-to-end with the Cursor runner and posts the final response back to the issue thread. Cursor runs now map Cyrus tool permissions into project-level Cursor CLI permissions, pre-enable configured MCP servers before run, and refresh permissions between subroutines so permission changes take effect without restarting the issue flow. Cursor sandbox is enabled by default for tool execution isolation; set `CYRUS_SANDBOX=disabled` to disable. Before each run, Cyrus validates that the installed `cursor-agent` version matches the tested version; a mismatch posts an error to Linear. Set `CYRUS_CURSOR_AGENT_VERSION` to your installed version to override. Assembled cursor-agent CLI args are now logged to console and session log files for debugging. Codex default runner model is now `gpt-5.3-codex` (configurable via `codexDefaultModel`). ([CYPACK-804](https://linear.app/ceedar/issue/CYPACK-804), [#858](https://github.com/ceedaragents/cyrus/pull/858))
+
 ## [0.2.21] - 2026-02-09
 
 ### Changed
 
@@ -91,6 +91,122 @@ When examining or working with a package SDK:
 
 3. Review the SDK's documentation, source code, and type definitions to understand its API and usage patterns.
 
+## Shared Skills Across Harnesses
+
+For reusable operational workflows (for example F1 test driving), keep a canonical skill in:
+
+- `skills/<skill-name>/SKILL.md`
+
+Then symlink that skill into harness-specific skill directories:
+
+- `.claude/skills/<skill-name>`
+- `.codex/skills/<skill-name>`
+- `.opencode/skills/<skill-name>`
+
+Use:
+
+```bash
+./scripts/symlink-skills.sh
+```
+
+Design rule:
+
+1. Keep subagent files thin wrappers.
+2. Put 95%+ workflow logic into canonical shared skills.
+3. Update shared skill first; avoid duplicating protocol text across harnesses.
+
+## Checklist For New Agent CLI Harnesses
+
+When implementing a new runner/harness (for example Codex, Gemini, OpenCode, or other CLIs), use this checklist before shipping.
+
+### 1) Session Lifecycle And Turn Limits
+
+- Verify turn-limit behavior (`maxTurns`, `maxSessionTurns`, or equivalent).
+- Confirm what error/result payload is emitted when limits are exceeded.
+- Ensure session stop behavior is explicit and deterministic.
+
+### 2) Prompt Model And Instructions
+
+- Identify how base system prompt is applied.
+- Identify whether appended instructions are supported and whether they extend or replace defaults.
+- Confirm provider-specific instruction fields (for example `developer_instructions`) and expected precedence.
+
+### 3) Streaming Event Schema
+
+- Capture real JSON event streams and document item types.
+- Determine whether events are full objects or deltas/partials that require aggregation.
+- Add replay tests from real transcripts.
+
+### 4) Final Message Semantics
+
+- Verify where the final answer lives:
+  - in a `result` payload (Claude-style), or
+  - in the last assistant message (Gemini-style), or
+  - mixed model/event behavior.
+- Ensure we always post a final `response` activity when work completes successfully.
+
+### 5) Tools And Permissions
+
+- Validate `tools`, `allowedTools`, and `disallowedTools` semantics for the SDK.
+- Validate approval/sandbox behavior for tool execution.
+- Verify tool calls produce both start and completion signals.
+- For providers that rely on static/project config files (for example Cursor CLI), implement a permission translation layer from Cyrus/Claude tool names to provider-native permission tokens and write that config before session start. This must support subroutine-time updates when allowed/disallowed tools change. For Cursor MCP servers, pre-enable them before session start (`agent mcp list` + `agent mcp enable <server>` per server) so tools are available in headless runs. When using Cursor in Cyrus, only MCP servers configured in `.cursor/mcp.json` should be treated as project MCP config; use Cursor's MCP config-location and file-format docs as the source of truth: https://cursor.com/docs/context/mcp#configuration-locations. For broad file permissions, map wildcard `Read(**)` / `Write(**)` to workspace-scoped patterns (for example `Read(./**)` / `Write(./**)`) to avoid unintentionally permitting absolute system paths. Reference: https://cursor.com/docs/cli/reference/permissions
+
+### 6) Prompt Streaming Input
+
+- Verify whether the SDK supports streaming/incremental prompt input.
+- Set `supportsStreamingInput` correctly and gate behavior in runner adapters.
+
+### 7) MCP Servers And Custom Tools
+
+- Verify MCP server config format and merge behavior.
+- Verify custom tool registration/invocation behavior.
+- Ensure MCP/custom-tool events are mapped into consistent runner message shapes.
+
+### 8) Runner Selection Via Labels And Description Selectors
+
+- Keep agent label and model label separate (example: `codex` and `gpt-5-codex`).
+- Support issue description selectors like `[agent=...]`, `[model=...]`, `[repo=...]`.
+- Add precedence tests for labels vs selectors vs repository defaults.
+
+### 9) Activity Formatting And Timeline Visibility
+
+- Ensure formatter output is timeline-ready (AgentActivity content fields).
+- Ensure tool lifecycle events are visible as activities (not silently dropped).
+- Use Markdown-compatible formatting for checklists:
+  - `- [ ] item`
+  - `- [x] item`
+
+### 10) Usage, Stop Reasons, And Typing
+
+- Map usage/cost/stop-reason fields to expected shared types.
+- Fill required compatibility fields even when provider omits them natively.
+- Keep strict TypeScript compatibility for cross-runner shared contracts.
+
+### 11) Config Schema And Backward Compatibility
+
+- Use provider-specific defaults (`claudeDefaultModel`, `geminiDefaultModel`, `codexDefaultModel`).
+- Add config migration logic for renamed or legacy fields.
+- Keep docs/comments provider-specific and explicit.
+
+### 12) Validation Protocol Before Merge
+
+- Run unit tests for new runner adapters and formatter behavior.
+- Run replay tests from real CLI transcripts.
+- Validate F1 end-to-end scenarios for:
+  - label-based runner/model selection
+  - description selector-based runner/model selection
+  - visible tool/file-edit activities in session timeline
+  - final response posting behavior
+
+### Codex Integration Lesson Learned
+
+Codex emitted tool activity at `item.started`/`item.completed` events, but those were initially not mapped to `tool_use`/`tool_result`. The result was missing action/file-edit visibility in Linear. For any new harness, treat tool lifecycle mapping as a first-class acceptance criterion, not a formatter-only concern.
+
+### Cursor Integration Lesson Learned
+
+Cursor CLI permissions are enforced from config (`~/.cursor/cli-config.json` or `<project>/.cursor/cli.json`) instead of dynamic per-request tool allowlists. For Cursor-like providers, do not rely on dynamic SDK tool constraints alone—add a translation layer (for example `mcp__server__tool` -> `Mcp(server:tool)`, `Bash(...)` -> `Shell(...)`) and sync project permissions before each run and between subroutines. Also pre-enable MCP servers via `agent mcp list` + `agent mcp enable <server>` using both project-listed and runner-configured server names so headless sessions can invoke MCP tools immediately. In Cyrus Cursor runs, treat `.cursor/mcp.json` as the project MCP source and follow Cursor's configuration-location and file-syntax docs (these differ from Claude's MCP interpretation): https://cursor.com/docs/context/mcp#configuration-locations. Use workspace-scoped wildcard file permissions (`Read(./**)`, `Write(./**)`) rather than unscoped `Read(**)` / `Write(**)` in translation defaults. Reference: https://cursor.com/docs/cli/reference/permissions
+
 ## Navigating GitHub Repositories
 
 When you need to examine source code from GitHub repositories (especially when GitHub's authentication blocks normal navigation):