fix(codex): handle v0.136+ TUI footer and skip MCP tool-call markers … (#274)

haofeif · Copilot · web-flow · commit ccc8a923bd6d · 2026-06-04T21:23:13.000+10:00
* fix(codex): handle v0.136+ TUI footer and skip MCP tool-call markers in extraction Two bugs in CodexProvider, both surfaced by the e2e suite on codex 0.136+: 1. TUI footer detection — codex 0.136 dropped the "N% left" segment; the footer is now just "model · path". TUI_FOOTER_PATTERN never matched, so get_status() couldn't distinguish the suggestion hint ("› Run /review on my current changes") from a real user message and pinned terminals at IDLE forever. Pattern extended to anchor on "·\s+[~/]". 2. extract_last_message_from_script() anchored on the first "•" after the user prompt, which can be a "• Called <mcp_server>.<tool>(...)" tool-call marker. The lines that follow are tool output, not the model's reply, so when codex called load_skill before responding the extracted message leaked the cao-worker-protocols skill body (which mentions "[CAO Handoff]") into the test output. Now iterates "•" matches and skips MCP_TOOL_CALL_PATTERN. Also tightened ASSISTANT_PREFIX_PATTERN from "\s*•" to "[^\S\n]*•" so the match anchors on the bullet line itself, not a preceding blank line. Verified end-to-end: 11/12 codex e2e pass (1 xfailed expected), 81/81 unit tests including 3 new regressions for the v0.136 footer and tool-call marker filtering. * style: apply black formatting Run black on codex.py and the unit test file to satisfy the Code Quality CI check on PR #274. * fix(codex): tighten MCP tool-call filter and apply it to status detection Address Copilot review on PR #274 — two related concerns: 1. MCP_TOOL_CALL_PATTERN was too loose: "^[^\S\n]*•\s+Called\s+\S+" matched any "• Called <word>" so a real model bullet like "• Called attention to the bug" would be filtered out as a tool call. Tightened to require the documented "<server>.<tool>(" shape: "^[^\S\n]*•\s+Called\s+[\w-]+\.[\w-]+\(". 2. get_status()'s COMPLETED check searched ASSISTANT_PREFIX_PATTERN without filtering tool-call markers. A "• Called <server>.<tool>(...)" bullet emitted before the model has actually replied could trip COMPLETED on its own. Factored the per-line tool-call filter into a _find_assistant_marker() helper and applied it in get_status() (both COMPLETED detection and assistant_after_last_user gating) and in extract_last_message_from_script(). Tests: - test_extract_does_not_filter_called_as_english_word — guards against over-broad filtering of legitimate model bullets. - test_get_status_idle_when_only_tool_call_after_user — IDLE when the only "•" after the user prompt is a tool-call marker. - test_get_status_completed_when_real_reply_after_tool_call — COMPLETED when a real "•" reply follows a tool-call marker. 84 unit tests pass. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,7 +6,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
 
-## [2.2.0] - 2026-06-01
+## [2.2.0] - 2026-06-04
 
 ### Highlights
 
@@ -52,6 +52,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Fixed
 
+- codex: detect v0.136+ TUI footer (`model · path` without `N% left`) so handoff/assign workers reliably reach COMPLETED instead of pinning at IDLE
+
+- codex: skip `• Called <tool>(...)` MCP tool-call markers during last-message extraction so skill body text (including `[CAO Handoff]`) no longer leaks into worker output
+
+- ci: stop TestPyPI squats breaking the release smoke test by installing the package with `--no-deps` and resolving deps from PyPI alone (#270)
+
+- kiro_cli: treat MCP-server boot screen as PROCESSING and gate shell-baseline IDLE on `_initialized` to fix paste-into-boot-screen race that dropped the first message after launch (#268)
+
 - mcp: reject `send_message` when `receiver_id` equals sender (#263)
 
 - tmux name validation hardening (CodeQL #66) (#258)
diff --git a/src/cli_agent_orchestrator/providers/codex.py b/src/cli_agent_orchestrator/providers/codex.py
@@ -31,7 +31,18 @@
 IDLE_PROMPT_PATTERN_LOG = r"\? for shortcuts"
 # Match assistant response start: "assistant:/codex:/agent:" (label style from synthetic
 # test fixtures) or "•" bullet point (real Codex interactive output format).
-ASSISTANT_PREFIX_PATTERN = r"^(?:(?:assistant|codex|agent)\s*:|\s*•)"
+# [^\S\n]* matches horizontal whitespace only (not newlines) so the match anchors
+# on the actual bullet line — using \s* would let the match start on a blank
+# line above the bullet, breaking per-line tool-call filtering downstream.
+ASSISTANT_PREFIX_PATTERN = r"^(?:(?:assistant|codex|agent)\s*:|[^\S\n]*•)"
+# MCP tool call marker emitted by Codex when invoking a tool, e.g.
+# "• Called cao-mcp-server.load_skill({...})". The body that follows
+# (└ ... lines) is the tool's return value, not the model's reply.
+# Used to skip these markers when locating the actual response start.
+# The "<server>.<tool>(" shape (identifier.identifier followed by an open
+# paren) is required so legitimate model bullets like "• Called attention
+# to the bug" don't get filtered as tool calls.
+MCP_TOOL_CALL_PATTERN = r"^[^\S\n]*•\s+Called\s+[\w-]+\.[\w-]+\("
 # Match user input: "You ..." (label style) or "› text" (Codex interactive prompt).
 # The "›[^\S\n]*\S" alternative requires a non-whitespace character on the same line
 # to distinguish user input ("› what is your role?") from the empty idle prompt ("› ").
@@ -50,7 +61,10 @@
 # Used to detect when the bottom lines contain TUI chrome rather than user input.
 # v0.110 and earlier: "? for shortcuts" and "N% context left"
 # v0.111+: "model · N% left · path" (PR #13202 restored draft footer hints)
-TUI_FOOTER_PATTERN = r"(?:\?\s+for shortcuts|context left|\d+%\s+left)"
+# v0.136+: "model · path" (the "N% left" segment was removed)
+# The "·\s+[~/]" alternative anchors on the path component of the footer,
+# which is shared across v0.111 and v0.136 status bars.
+TUI_FOOTER_PATTERN = r"(?:\?\s+for shortcuts|context left|\d+%\s+left|·\s+[~/])"
 # Codex TUI progress spinner: "• Working (0s • esc to interrupt)",
 # "• Thinking (2s ...)", "• Starting script creation (10s • esc to interrupt)".
 # The prefix text varies but the "(Ns • esc to interrupt)" format is consistent.
@@ -104,6 +118,27 @@ def _compute_tui_footer_cutoff(all_lines: list) -> int:
     return len("\n".join(all_lines[:footer_start_idx]))
 
 
+def _find_assistant_marker(text: str) -> Optional[re.Match[str]]:
+    """Find the first ASSISTANT_PREFIX_PATTERN match in ``text`` whose line
+    is not an MCP tool-call marker.
+
+    Codex emits ``• Called <server>.<tool>(...)`` when invoking an MCP tool;
+    that bullet matches ASSISTANT_PREFIX_PATTERN but is followed by tool
+    output, not the model's reply. Anchoring on it would conflate tool
+    output with the model response (status: false COMPLETED;
+    extraction: skill-body leak).
+    """
+    for m in re.finditer(ASSISTANT_PREFIX_PATTERN, text, re.IGNORECASE | re.MULTILINE):
+        line_end = text.find("\n", m.start())
+        if line_end == -1:
+            line_end = len(text)
+        line = text[m.start() : line_end]
+        if re.match(MCP_TOOL_CALL_PATTERN, line):
+            continue
+        return m
+    return None
+
+
 class ProviderError(Exception):
     """Exception raised for provider-specific errors."""
 
@@ -321,13 +356,10 @@ def get_status(self, tail_lines: Optional[int] = None) -> TerminalStatus:
                 last_user = match
 
         output_after_last_user = clean_output[last_user.start() :] if last_user else clean_output
+        # Skip MCP tool-call markers — those mark "model invoked a tool", not
+        # "model has replied", and shouldn't gate WAITING/ERROR detection.
         assistant_after_last_user = bool(
-            last_user
-            and re.search(
-                ASSISTANT_PREFIX_PATTERN,
-                output_after_last_user,
-                re.IGNORECASE | re.MULTILINE,
-            )
+            last_user and _find_assistant_marker(output_after_last_user) is not None
         )
 
         # Check trust prompt early — the trust menu uses › which matches the idle prompt
@@ -373,13 +405,12 @@ def get_status(self, tail_lines: Optional[int] = None) -> TerminalStatus:
             if re.search(TUI_PROGRESS_PATTERN, tail_output, re.MULTILINE):
                 return TerminalStatus.PROCESSING
 
-            # Consider COMPLETED only if we see an assistant marker after the last user message.
+            # Consider COMPLETED only if we see an assistant marker (skipping
+            # MCP tool-call markers) after the last user message. Without the
+            # tool-call filter, "• Called <server>.<tool>(...)" emitted before
+            # the model has actually replied would trip COMPLETED prematurely.
             if last_user is not None:
-                if re.search(
-                    ASSISTANT_PREFIX_PATTERN,
-                    clean_output[last_user.start() :],
-                    re.IGNORECASE | re.MULTILINE,
-                ):
+                if _find_assistant_marker(clean_output[last_user.start() :]) is not None:
                     return TerminalStatus.COMPLETED
 
                 return TerminalStatus.IDLE
@@ -428,13 +459,12 @@ def extract_last_message_from_script(self, script_output: str) -> str:
             last_user = user_matches[-1]
 
             # Find the first assistant response marker (• or assistant:) after
-            # the user message. This correctly skips multi-line user messages
-            # that wrap across several lines in the Codex TUI.
-            asst_after_user = re.search(
-                ASSISTANT_PREFIX_PATTERN,
-                clean_output[last_user.start() :],
-                re.IGNORECASE | re.MULTILINE,
-            )
+            # the user message, skipping "• Called <server>.<tool>(...)" MCP
+            # tool call markers — those are followed by tool output, not the
+            # model's reply. Anchoring on a tool call marker would pull tool
+            # output (e.g. skill body text) into the extracted response.
+            asst_after_user = _find_assistant_marker(clean_output[last_user.start() :])
+
             if asst_after_user:
                 response_start = last_user.start() + asst_after_user.start()
             else:
@@ -473,9 +503,20 @@ def extract_last_message_from_script(self, script_output: str) -> str:
                 return response_text.strip()
 
         # Fallback: assistant marker based extraction (no user message found).
-        matches = list(
+        # Filter out "• Called <tool>(...)" MCP tool call markers so we anchor
+        # on the model's actual reply, not tool output.
+        all_matches = list(
             re.finditer(ASSISTANT_PREFIX_PATTERN, clean_output, re.IGNORECASE | re.MULTILINE)
         )
+        matches = []
+        for m in all_matches:
+            line_end = clean_output.find("\n", m.start())
+            if line_end == -1:
+                line_end = len(clean_output)
+            line = clean_output[m.start() : line_end]
+            if re.match(MCP_TOOL_CALL_PATTERN, line):
+                continue
+            matches.append(m)
 
         if not matches:
             raise ValueError("No Codex response found - no assistant marker detected")
diff --git a/test/providers/test_codex_provider_unit.py b/test/providers/test_codex_provider_unit.py