Skip to content

Commit ccc8a92

Browse files
haofeifCopilot
andauthored
fix(codex): handle v0.136+ TUI footer and skip MCP tool-call markers … (#274)
* fix(codex): handle v0.136+ TUI footer and skip MCP tool-call markers in extraction Two bugs in CodexProvider, both surfaced by the e2e suite on codex 0.136+: 1. TUI footer detection — codex 0.136 dropped the "N% left" segment; the footer is now just "model · path". TUI_FOOTER_PATTERN never matched, so get_status() couldn't distinguish the suggestion hint ("› Run /review on my current changes") from a real user message and pinned terminals at IDLE forever. Pattern extended to anchor on "·\s+[~/]". 2. extract_last_message_from_script() anchored on the first "•" after the user prompt, which can be a "• Called <mcp_server>.<tool>(...)" tool-call marker. The lines that follow are tool output, not the model's reply, so when codex called load_skill before responding the extracted message leaked the cao-worker-protocols skill body (which mentions "[CAO Handoff]") into the test output. Now iterates "•" matches and skips MCP_TOOL_CALL_PATTERN. Also tightened ASSISTANT_PREFIX_PATTERN from "\s*•" to "[^\S\n]*•" so the match anchors on the bullet line itself, not a preceding blank line. Verified end-to-end: 11/12 codex e2e pass (1 xfailed expected), 81/81 unit tests including 3 new regressions for the v0.136 footer and tool-call marker filtering. * style: apply black formatting Run black on codex.py and the unit test file to satisfy the Code Quality CI check on PR #274. * fix(codex): tighten MCP tool-call filter and apply it to status detection Address Copilot review on PR #274 — two related concerns: 1. MCP_TOOL_CALL_PATTERN was too loose: "^[^\S\n]*•\s+Called\s+\S+" matched any "• Called <word>" so a real model bullet like "• Called attention to the bug" would be filtered out as a tool call. Tightened to require the documented "<server>.<tool>(" shape: "^[^\S\n]*•\s+Called\s+[\w-]+\.[\w-]+\(". 2. get_status()'s COMPLETED check searched ASSISTANT_PREFIX_PATTERN without filtering tool-call markers. A "• Called <server>.<tool>(...)" bullet emitted before the model has actually replied could trip COMPLETED on its own. Factored the per-line tool-call filter into a _find_assistant_marker() helper and applied it in get_status() (both COMPLETED detection and assistant_after_last_user gating) and in extract_last_message_from_script(). Tests: - test_extract_does_not_filter_called_as_english_word — guards against over-broad filtering of legitimate model bullets. - test_get_status_idle_when_only_tool_call_after_user — IDLE when the only "•" after the user prompt is a tool-call marker. - test_get_status_completed_when_real_reply_after_tool_call — COMPLETED when a real "•" reply follows a tool-call marker. 84 unit tests pass. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
1 parent 946a5b2 commit ccc8a92

3 files changed

Lines changed: 295 additions & 23 deletions

File tree

CHANGELOG.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77
## [Unreleased]
88

9-
## [2.2.0] - 2026-06-01
9+
## [2.2.0] - 2026-06-04
1010

1111
### Highlights
1212

@@ -52,6 +52,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
5252

5353
### Fixed
5454

55+
- codex: detect v0.136+ TUI footer (`model · path` without `N% left`) so handoff/assign workers reliably reach COMPLETED instead of pinning at IDLE
56+
57+
- codex: skip `• Called <tool>(...)` MCP tool-call markers during last-message extraction so skill body text (including `[CAO Handoff]`) no longer leaks into worker output
58+
59+
- ci: stop TestPyPI squats breaking the release smoke test by installing the package with `--no-deps` and resolving deps from PyPI alone (#270)
60+
61+
- kiro_cli: treat MCP-server boot screen as PROCESSING and gate shell-baseline IDLE on `_initialized` to fix paste-into-boot-screen race that dropped the first message after launch (#268)
62+
5563
- mcp: reject `send_message` when `receiver_id` equals sender (#263)
5664

5765
- tmux name validation hardening (CodeQL #66) (#258)

src/cli_agent_orchestrator/providers/codex.py

Lines changed: 63 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,18 @@
3131
IDLE_PROMPT_PATTERN_LOG = r"\? for shortcuts"
3232
# Match assistant response start: "assistant:/codex:/agent:" (label style from synthetic
3333
# test fixtures) or "•" bullet point (real Codex interactive output format).
34-
ASSISTANT_PREFIX_PATTERN = r"^(?:(?:assistant|codex|agent)\s*:|\s*•)"
34+
# [^\S\n]* matches horizontal whitespace only (not newlines) so the match anchors
35+
# on the actual bullet line — using \s* would let the match start on a blank
36+
# line above the bullet, breaking per-line tool-call filtering downstream.
37+
ASSISTANT_PREFIX_PATTERN = r"^(?:(?:assistant|codex|agent)\s*:|[^\S\n]*•)"
38+
# MCP tool call marker emitted by Codex when invoking a tool, e.g.
39+
# "• Called cao-mcp-server.load_skill({...})". The body that follows
40+
# (└ ... lines) is the tool's return value, not the model's reply.
41+
# Used to skip these markers when locating the actual response start.
42+
# The "<server>.<tool>(" shape (identifier.identifier followed by an open
43+
# paren) is required so legitimate model bullets like "• Called attention
44+
# to the bug" don't get filtered as tool calls.
45+
MCP_TOOL_CALL_PATTERN = r"^[^\S\n]*•\s+Called\s+[\w-]+\.[\w-]+\("
3546
# Match user input: "You ..." (label style) or "› text" (Codex interactive prompt).
3647
# The "›[^\S\n]*\S" alternative requires a non-whitespace character on the same line
3748
# to distinguish user input ("› what is your role?") from the empty idle prompt ("› ").
@@ -50,7 +61,10 @@
5061
# Used to detect when the bottom lines contain TUI chrome rather than user input.
5162
# v0.110 and earlier: "? for shortcuts" and "N% context left"
5263
# v0.111+: "model · N% left · path" (PR #13202 restored draft footer hints)
53-
TUI_FOOTER_PATTERN = r"(?:\?\s+for shortcuts|context left|\d+%\s+left)"
64+
# v0.136+: "model · path" (the "N% left" segment was removed)
65+
# The "·\s+[~/]" alternative anchors on the path component of the footer,
66+
# which is shared across v0.111 and v0.136 status bars.
67+
TUI_FOOTER_PATTERN = r"(?:\?\s+for shortcuts|context left|\d+%\s+left|·\s+[~/])"
5468
# Codex TUI progress spinner: "• Working (0s • esc to interrupt)",
5569
# "• Thinking (2s ...)", "• Starting script creation (10s • esc to interrupt)".
5670
# The prefix text varies but the "(Ns • esc to interrupt)" format is consistent.
@@ -104,6 +118,27 @@ def _compute_tui_footer_cutoff(all_lines: list) -> int:
104118
return len("\n".join(all_lines[:footer_start_idx]))
105119

106120

121+
def _find_assistant_marker(text: str) -> Optional[re.Match[str]]:
122+
"""Find the first ASSISTANT_PREFIX_PATTERN match in ``text`` whose line
123+
is not an MCP tool-call marker.
124+
125+
Codex emits ``• Called <server>.<tool>(...)`` when invoking an MCP tool;
126+
that bullet matches ASSISTANT_PREFIX_PATTERN but is followed by tool
127+
output, not the model's reply. Anchoring on it would conflate tool
128+
output with the model response (status: false COMPLETED;
129+
extraction: skill-body leak).
130+
"""
131+
for m in re.finditer(ASSISTANT_PREFIX_PATTERN, text, re.IGNORECASE | re.MULTILINE):
132+
line_end = text.find("\n", m.start())
133+
if line_end == -1:
134+
line_end = len(text)
135+
line = text[m.start() : line_end]
136+
if re.match(MCP_TOOL_CALL_PATTERN, line):
137+
continue
138+
return m
139+
return None
140+
141+
107142
class ProviderError(Exception):
108143
"""Exception raised for provider-specific errors."""
109144

@@ -321,13 +356,10 @@ def get_status(self, tail_lines: Optional[int] = None) -> TerminalStatus:
321356
last_user = match
322357

323358
output_after_last_user = clean_output[last_user.start() :] if last_user else clean_output
359+
# Skip MCP tool-call markers — those mark "model invoked a tool", not
360+
# "model has replied", and shouldn't gate WAITING/ERROR detection.
324361
assistant_after_last_user = bool(
325-
last_user
326-
and re.search(
327-
ASSISTANT_PREFIX_PATTERN,
328-
output_after_last_user,
329-
re.IGNORECASE | re.MULTILINE,
330-
)
362+
last_user and _find_assistant_marker(output_after_last_user) is not None
331363
)
332364

333365
# Check trust prompt early — the trust menu uses › which matches the idle prompt
@@ -373,13 +405,12 @@ def get_status(self, tail_lines: Optional[int] = None) -> TerminalStatus:
373405
if re.search(TUI_PROGRESS_PATTERN, tail_output, re.MULTILINE):
374406
return TerminalStatus.PROCESSING
375407

376-
# Consider COMPLETED only if we see an assistant marker after the last user message.
408+
# Consider COMPLETED only if we see an assistant marker (skipping
409+
# MCP tool-call markers) after the last user message. Without the
410+
# tool-call filter, "• Called <server>.<tool>(...)" emitted before
411+
# the model has actually replied would trip COMPLETED prematurely.
377412
if last_user is not None:
378-
if re.search(
379-
ASSISTANT_PREFIX_PATTERN,
380-
clean_output[last_user.start() :],
381-
re.IGNORECASE | re.MULTILINE,
382-
):
413+
if _find_assistant_marker(clean_output[last_user.start() :]) is not None:
383414
return TerminalStatus.COMPLETED
384415

385416
return TerminalStatus.IDLE
@@ -428,13 +459,12 @@ def extract_last_message_from_script(self, script_output: str) -> str:
428459
last_user = user_matches[-1]
429460

430461
# Find the first assistant response marker (• or assistant:) after
431-
# the user message. This correctly skips multi-line user messages
432-
# that wrap across several lines in the Codex TUI.
433-
asst_after_user = re.search(
434-
ASSISTANT_PREFIX_PATTERN,
435-
clean_output[last_user.start() :],
436-
re.IGNORECASE | re.MULTILINE,
437-
)
462+
# the user message, skipping "• Called <server>.<tool>(...)" MCP
463+
# tool call markers — those are followed by tool output, not the
464+
# model's reply. Anchoring on a tool call marker would pull tool
465+
# output (e.g. skill body text) into the extracted response.
466+
asst_after_user = _find_assistant_marker(clean_output[last_user.start() :])
467+
438468
if asst_after_user:
439469
response_start = last_user.start() + asst_after_user.start()
440470
else:
@@ -473,9 +503,20 @@ def extract_last_message_from_script(self, script_output: str) -> str:
473503
return response_text.strip()
474504

475505
# Fallback: assistant marker based extraction (no user message found).
476-
matches = list(
506+
# Filter out "• Called <tool>(...)" MCP tool call markers so we anchor
507+
# on the model's actual reply, not tool output.
508+
all_matches = list(
477509
re.finditer(ASSISTANT_PREFIX_PATTERN, clean_output, re.IGNORECASE | re.MULTILINE)
478510
)
511+
matches = []
512+
for m in all_matches:
513+
line_end = clean_output.find("\n", m.start())
514+
if line_end == -1:
515+
line_end = len(clean_output)
516+
line = clean_output[m.start() : line_end]
517+
if re.match(MCP_TOOL_CALL_PATTERN, line):
518+
continue
519+
matches.append(m)
479520

480521
if not matches:
481522
raise ValueError("No Codex response found - no assistant marker detected")

0 commit comments

Comments
 (0)