Skip to content

Commit a0534ae

Browse files
kotlarmilosCopilot
andauthored
[runtime-failure-observer] Inline curl calls and require fetched evidence before opening PRs (#1612)
* runtime-failure-observer: fix network egress and ban ungrounded PRs The observer agent's shell commands were intermittently denied. The copilot harness authorizes a command only by its first token, but the prompt instructed the agent to pre-bind URLs (`url=...` then `curl "$url"`) and to loop over definitions with `for`. Those forms start with an assignment or keyword, so the harness rejected them with "Permission denied and could not request permission from user" even though the firewall allowlist contains .dev.azure.com and .helix.dot.net. Across the three real runs this produced three different outcomes (worked around, noop, and a false report_incomplete that blamed the firewall). Changes to runtime-failure-observer.agent.md (prompt body, imported at runtime via {{#runtime-import}}, so no lock recompile needed): - Rule 11 now requires every shell command to begin with an allow-listed program; inline URLs into `curl ... -o file`, no variable pre-bind, no loops. Step 1, Step 2, and the Step 4 dedup cache snippet are rewritten to match. - New Step 0 preflight proves egress with one inlined curl and, on failure, emits an accurate report_incomplete (harness command authorization, not firewall) instead of misdiagnosing the firewall. - New rule 6b forbids opening a PR unless the build timeline and Helix console were actually downloaded this run; no citing build ids, Helix GUIDs, exit codes, or stderr from memory or inference. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Tighten wording to match prompt style Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review feedback: shell-safe placeholders and allowed safe-output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent f39fb5f commit a0534ae

1 file changed

Lines changed: 25 additions & 19 deletions

File tree

.github/workflows/runtime-failure-observer.agent.md

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -87,12 +87,12 @@ The agent reads `dotnet/runtime` and the failing build logs. It never writes to
8787
3. **Every PR title starts with `[runtime-observer] `.** PRs are opened as drafts.
8888
4. **Small-fix bounds for complete autofix PRs.** A *complete* fix PR must satisfy all of: `<=` 30 changed lines total, `<=` 2 files (one source + one test), no new public API, no protocol change, no native code change. If the fix needs more, do not silently truncate it: open a clearly-marked best-effort/diagnosability **draft** PR (Step 5) that a human finishes. Best-effort and diagnosability draft PRs may exceed these bounds but must be marked work-in-progress and must still avoid new public API, protocol changes, and native code.
8989
5. **Don't propose fixes for runtime test bugs.** If the failure is in the test binary itself (assertion in the test code, missing mock, runtime API regression), record `skipped: runtime-side issue` and emit nothing.
90-
6. **Never assume.** Cite the runtime build URL, the Helix work item URL, the xharness command line, and the exact stderr / exit code in every PR body.
90+
6. **Never assume; cite only what you fetched this run.** Cite the runtime build URL, the Helix work item URL, the xharness command line, and the exact stderr / exit code in every PR body. If any required fetch (build list, timeline, Helix work items, console log) failed, was empty, or was denied, emit nothing for that candidate — never reconstruct a build id, URL, GUID, exit code, or stderr from memory or inference.
9191
7. **Dedup.** Before emitting, search open and recently merged PRs / issues in `dotnet/xharness` for the same xharness-signature. On match: `existing-PR #<n>` or `existing-issue #<n>`, emit nothing.
9292
8. **Same-run dedup cache.** Persist `(exit_code, command, signature_norm)` keys in `/tmp/gh-aw/agent/filed.tsv`. On hit: `dup-this-run`, skip.
9393
9. **All state under `/tmp/gh-aw/agent/`.**
9494
10. **AzDO API: anonymous only.** Stay on `https://dev.azure.com/dnceng-public/public/_apis/build/...`.
95-
11. **Pre-bind every URL with `?` or `&` to a variable on its own line, then `curl -s "$url"`.**
95+
11. **Start every shell command with an allow-listed program (`curl`, `jq`, `gh`, `grep`, `printf`, ...).** The harness authorizes by first token only, so a command beginning with `url=...`, `key=...`, or `for` is denied with `Permission denied and could not request permission from user` even when the firewall allows the domain. Inline each URL into a single `curl ... -o <file>` (keep `%24` for `$top`); never pre-bind URLs to variables or loop over `curl`.
9696

9797
## Pipelines to scan
9898

@@ -125,42 +125,49 @@ These exit codes from `src/Microsoft.DotNet.XHarness.Common/CLI/ExitCode.cs` are
125125

126126
Exit codes outside this table: record `skipped: exit code <n> not in improvement table` and stop.
127127

128+
## Step 0. Preflight: confirm network egress
129+
130+
Prove the harness will let `curl` reach the public AzDO API before scanning (rule 11):
131+
132+
```bash
133+
curl -s "https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=154&branchName=refs/heads/main&statusFilter=completed&resultFilter=failed,partiallySucceeded&%24top=1&api-version=7.1" -o /tmp/gh-aw/agent/preflight.json
134+
jq -r '.count' /tmp/gh-aw/agent/preflight.json
135+
```
136+
137+
Valid JSON: continue. If `curl` itself is denied, that is the harness rejecting the command form, not the firewall (`.dev.azure.com` and `.helix.dot.net` are allow-listed). If an inlined first-token `curl` still fails, record `skipped: harness denied inlined curl to dev.azure.com; firewall already allows it` and stop. Never blame the firewall allowlist and never open a PR.
138+
128139
## Step 1. Set up
129140

141+
Run one inlined `curl` per definition id in `154 223 224 225 226 228 260 261 265`, substituting the id in the URL and the `-o` path:
142+
130143
```bash
131-
for def in 154 223 224 225 226 228 260 261 265; do
132-
url="https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=${def}&branchName=refs/heads/main&statusFilter=completed&resultFilter=failed,partiallySucceeded&%24top=10&api-version=7.1"
133-
curl -s "$url" | tee "/tmp/gh-aw/agent/builds-${def}.json" | jq -r '.value[] | "\(.id) \(.result) \(.finishTime)"' | head
134-
done
144+
curl -s "https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=154&branchName=refs/heads/main&statusFilter=completed&resultFilter=failed,partiallySucceeded&%24top=10&api-version=7.1" -o /tmp/gh-aw/agent/builds-154.json
145+
jq -r '.value[] | "\(.id) \(.result) \(.finishTime)"' /tmp/gh-aw/agent/builds-154.json | head
135146
```
136147

137148
Per definition, pick `source` = most recent failed build inside the last 7 days. Older: `skipped: stale (>7d)`.
138149

139150
## Step 2. Walk timelines, find xharness invocations
140151

141-
For each `source`:
152+
For each `source` (inline the build id in place of `SRCID`):
142153

143154
```bash
144-
src_id=<source build id>
145-
url="https://dev.azure.com/dnceng-public/public/_apis/build/builds/${src_id}/timeline?api-version=7.1"
146-
curl -s "$url" | tee /tmp/gh-aw/agent/timeline-${src_id}.json
155+
curl -s "https://dev.azure.com/dnceng-public/public/_apis/build/builds/SRCID/timeline?api-version=7.1" -o "/tmp/gh-aw/agent/timeline-SRCID.json"
147156
```
148157

149158
Reconstruct `Stage -> Phase -> Job -> Task` via `parentId`. A failed leaf with non-null `log.id` is a candidate.
150159

151160
Filter to Helix work items only. xharness runs inside Helix work items, not on the AzDO agent. From the `Send to Helix` task log, extract `Sent Helix Job: <GUID>`:
152161

153162
```bash
154-
log_url='<Send to Helix task log url>'
155-
curl -s "$log_url" | tee /tmp/gh-aw/agent/helix-send.log
163+
curl -s "<Send to Helix task log url>" -o /tmp/gh-aw/agent/helix-send.log
156164
grep -oE 'Sent Helix Job: [a-f0-9-]+' /tmp/gh-aw/agent/helix-send.log
157165
```
158166

159-
For each Helix job, list failing work items:
167+
For each Helix job, list failing work items (inline the job id in place of `JOBID`):
160168

161169
```bash
162-
url="https://helix.dot.net/api/jobs/<jobId>/workitems?api-version=2019-06-17"
163-
curl -s "$url" | tee /tmp/gh-aw/agent/helix-${jobId}.json
170+
curl -s "https://helix.dot.net/api/jobs/JOBID/workitems?api-version=2019-06-17" -o "/tmp/gh-aw/agent/helix-JOBID.json"
164171
```
165172

166173
A work item is an xharness invocation candidate if `ConsoleOutputUri` contains an xharness command (`xharness apple`, `xharness android`, `xharness wasm`, or `dotnet exec .../Microsoft.DotNet.XHarness.CLI.dll`). Fetch the console and scan for:
@@ -192,11 +199,10 @@ gh pr list --repo dotnet/xharness --state all --limit 50 \
192199

193200
On match (open or merged in last 30 days): `existing-PR #<n>` / `existing-issue #<n>`. Emit nothing.
194201

195-
Same-run cache:
202+
Same-run cache. Use the `<exit_code>|<command_norm>|<signature_norm>` key inline, never via a variable (rule 11):
196203
```bash
197-
key="${exit_code}|<command_norm>|<signature_norm>"
198-
test -f /tmp/gh-aw/agent/filed.tsv && cut -f1 /tmp/gh-aw/agent/filed.tsv | grep -Fxq "$key" && echo "dup-this-run"
199-
printf '%s\t%s\n' "$key" "aw_<id>" >> /tmp/gh-aw/agent/filed.tsv
204+
grep -Fxq "70|apple-test-maccatalyst|run-timed-out" /tmp/gh-aw/agent/filed.tsv 2>/dev/null && echo "dup-this-run"
205+
printf '%s\n' "70|apple-test-maccatalyst|run-timed-out" >> /tmp/gh-aw/agent/filed.tsv
200206
```
201207

202208
## Step 5. Decide which kind of PR

0 commit comments

Comments
 (0)