[runtime-failure-observer] Inline curl calls and require fetched evidence before opening PRs (#1612)

kotlarmilos · Copilot · web-flow · commit a0534aec8f77 · 2026-06-19T09:35:09.000+02:00
* runtime-failure-observer: fix network egress and ban ungrounded PRs

The observer agent's shell commands were intermittently denied. The
copilot harness authorizes a command only by its first token, but the
prompt instructed the agent to pre-bind URLs (`url=...` then
`curl "$url"`) and to loop over definitions with `for`. Those forms
start with an assignment or keyword, so the harness rejected them with
"Permission denied and could not request permission from user" even
though the firewall allowlist contains .dev.azure.com and
.helix.dot.net. Across the three real runs this produced three
different outcomes (worked around, noop, and a false report_incomplete
that blamed the firewall).

Changes to runtime-failure-observer.agent.md (prompt body, imported at
runtime via {{#runtime-import}}, so no lock recompile needed):

- Rule 11 now requires every shell command to begin with an
  allow-listed program; inline URLs into `curl ... -o file`, no
  variable pre-bind, no loops. Step 1, Step 2, and the Step 4 dedup
  cache snippet are rewritten to match.
- New Step 0 preflight proves egress with one inlined curl and, on
  failure, emits an accurate report_incomplete (harness command
  authorization, not firewall) instead of misdiagnosing the firewall.
- New rule 6b forbids opening a PR unless the build timeline and Helix
  console were actually downloaded this run; no citing build ids,
  Helix GUIDs, exit codes, or stderr from memory or inference.

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;

* Tighten wording to match prompt style

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;

* Address review feedback: shell-safe placeholders and allowed safe-output

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;

---------

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;
diff --git a/.github/workflows/runtime-failure-observer.agent.md b/.github/workflows/runtime-failure-observer.agent.md
@@ -87,12 +87,12 @@ The agent reads `dotnet/runtime` and the failing build logs. It never writes to
 3. **Every PR title starts with `[runtime-observer] `.** PRs are opened as drafts.
 4. **Small-fix bounds for complete autofix PRs.** A *complete* fix PR must satisfy all of: `<=` 30 changed lines total, `<=` 2 files (one source + one test), no new public API, no protocol change, no native code change. If the fix needs more, do not silently truncate it: open a clearly-marked best-effort/diagnosability **draft** PR (Step 5) that a human finishes. Best-effort and diagnosability draft PRs may exceed these bounds but must be marked work-in-progress and must still avoid new public API, protocol changes, and native code.
 5. **Don't propose fixes for runtime test bugs.** If the failure is in the test binary itself (assertion in the test code, missing mock, runtime API regression), record `skipped: runtime-side issue` and emit nothing.
-6. **Never assume.** Cite the runtime build URL, the Helix work item URL, the xharness command line, and the exact stderr / exit code in every PR body.
+6. **Never assume; cite only what you fetched this run.** Cite the runtime build URL, the Helix work item URL, the xharness command line, and the exact stderr / exit code in every PR body. If any required fetch (build list, timeline, Helix work items, console log) failed, was empty, or was denied, emit nothing for that candidate — never reconstruct a build id, URL, GUID, exit code, or stderr from memory or inference.
 7. **Dedup.** Before emitting, search open and recently merged PRs / issues in `dotnet/xharness` for the same xharness-signature. On match: `existing-PR #<n>` or `existing-issue #<n>`, emit nothing.
 8. **Same-run dedup cache.** Persist `(exit_code, command, signature_norm)` keys in `/tmp/gh-aw/agent/filed.tsv`. On hit: `dup-this-run`, skip.
 9. **All state under `/tmp/gh-aw/agent/`.**
 10. **AzDO API: anonymous only.** Stay on `https://dev.azure.com/dnceng-public/public/_apis/build/...`.
-11. **Pre-bind every URL with `?` or `&` to a variable on its own line, then `curl -s "$url"`.**
+11. **Start every shell command with an allow-listed program (`curl`, `jq`, `gh`, `grep`, `printf`, ...).** The harness authorizes by first token only, so a command beginning with `url=...`, `key=...`, or `for` is denied with `Permission denied and could not request permission from user` even when the firewall allows the domain. Inline each URL into a single `curl ... -o <file>` (keep `%24` for `$top`); never pre-bind URLs to variables or loop over `curl`.
 
 ## Pipelines to scan
 
@@ -125,42 +125,49 @@ These exit codes from `src/Microsoft.DotNet.XHarness.Common/CLI/ExitCode.cs` are
 
 Exit codes outside this table: record `skipped: exit code <n> not in improvement table` and stop.
 
+## Step 0. Preflight: confirm network egress
+
+Prove the harness will let `curl` reach the public AzDO API before scanning (rule 11):
+
+```bash
+curl -s "https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=154&branchName=refs/heads/main&statusFilter=completed&resultFilter=failed,partiallySucceeded&%24top=1&api-version=7.1" -o /tmp/gh-aw/agent/preflight.json
+jq -r '.count' /tmp/gh-aw/agent/preflight.json
+```
+
+Valid JSON: continue. If `curl` itself is denied, that is the harness rejecting the command form, not the firewall (`.dev.azure.com` and `.helix.dot.net` are allow-listed). If an inlined first-token `curl` still fails, record `skipped: harness denied inlined curl to dev.azure.com; firewall already allows it` and stop. Never blame the firewall allowlist and never open a PR.
+
 ## Step 1. Set up
 
+Run one inlined `curl` per definition id in `154 223 224 225 226 228 260 261 265`, substituting the id in the URL and the `-o` path:
+
 ```bash
-for def in 154 223 224 225 226 228 260 261 265; do
-  url="https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=${def}&branchName=refs/heads/main&statusFilter=completed&resultFilter=failed,partiallySucceeded&%24top=10&api-version=7.1"
-  curl -s "$url" | tee "/tmp/gh-aw/agent/builds-${def}.json" | jq -r '.value[] | "\(.id) \(.result) \(.finishTime)"' | head
-done
+curl -s "https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=154&branchName=refs/heads/main&statusFilter=completed&resultFilter=failed,partiallySucceeded&%24top=10&api-version=7.1" -o /tmp/gh-aw/agent/builds-154.json
+jq -r '.value[] | "\(.id) \(.result) \(.finishTime)"' /tmp/gh-aw/agent/builds-154.json | head
 ```
 
 Per definition, pick `source` = most recent failed build inside the last 7 days. Older: `skipped: stale (>7d)`.
 
 ## Step 2. Walk timelines, find xharness invocations
 
-For each `source`:
+For each `source` (inline the build id in place of `SRCID`):
 
 ```bash
-src_id=<source build id>
-url="https://dev.azure.com/dnceng-public/public/_apis/build/builds/${src_id}/timeline?api-version=7.1"
-curl -s "$url" | tee /tmp/gh-aw/agent/timeline-${src_id}.json
+curl -s "https://dev.azure.com/dnceng-public/public/_apis/build/builds/SRCID/timeline?api-version=7.1" -o "/tmp/gh-aw/agent/timeline-SRCID.json"
 ```
 
 Reconstruct `Stage -> Phase -> Job -> Task` via `parentId`. A failed leaf with non-null `log.id` is a candidate.
 
 Filter to Helix work items only. xharness runs inside Helix work items, not on the AzDO agent. From the `Send to Helix` task log, extract `Sent Helix Job: <GUID>`:
 
 ```bash
-log_url='<Send to Helix task log url>'
-curl -s "$log_url" | tee /tmp/gh-aw/agent/helix-send.log
+curl -s "<Send to Helix task log url>" -o /tmp/gh-aw/agent/helix-send.log
 grep -oE 'Sent Helix Job: [a-f0-9-]+' /tmp/gh-aw/agent/helix-send.log
 ```
 
-For each Helix job, list failing work items:
+For each Helix job, list failing work items (inline the job id in place of `JOBID`):
 
 ```bash
-url="https://helix.dot.net/api/jobs/<jobId>/workitems?api-version=2019-06-17"
-curl -s "$url" | tee /tmp/gh-aw/agent/helix-${jobId}.json
+curl -s "https://helix.dot.net/api/jobs/JOBID/workitems?api-version=2019-06-17" -o "/tmp/gh-aw/agent/helix-JOBID.json"
 ```
 
 A work item is an xharness invocation candidate if `ConsoleOutputUri` contains an xharness command (`xharness apple`, `xharness android`, `xharness wasm`, or `dotnet exec .../Microsoft.DotNet.XHarness.CLI.dll`). Fetch the console and scan for:
@@ -192,11 +199,10 @@ gh pr list --repo dotnet/xharness --state all --limit 50 \
 
 On match (open or merged in last 30 days): `existing-PR #<n>` / `existing-issue #<n>`. Emit nothing.
 
-Same-run cache:
+Same-run cache. Use the `<exit_code>|<command_norm>|<signature_norm>` key inline, never via a variable (rule 11):
 ```bash
-key="${exit_code}|<command_norm>|<signature_norm>"
-test -f /tmp/gh-aw/agent/filed.tsv && cut -f1 /tmp/gh-aw/agent/filed.tsv | grep -Fxq "$key" && echo "dup-this-run"
-printf '%s\t%s\n' "$key" "aw_<id>" >> /tmp/gh-aw/agent/filed.tsv
+grep -Fxq "70|apple-test-maccatalyst|run-timed-out" /tmp/gh-aw/agent/filed.tsv 2>/dev/null && echo "dup-this-run"
+printf '%s\n' "70|apple-test-maccatalyst|run-timed-out" >> /tmp/gh-aw/agent/filed.tsv
 ```
 
 ## Step 5. Decide which kind of PR