Skip to content

Commit 8cce897

Browse files
authored
auto-triage improvements: security improvements, sandbox updates, skill refinements (#15545)
* update @flue/cli and @flue/client to latest versions * update @flue/cli and @flue/client to latest versions * replace curl GitHub API calls with gh CLI in triage verify skill * refine comment template formatting in triage comment skill * add scope reference to skills * test the github proxy * update @flue/cli and @flue/client to latest * add scope to comment skill * remove hardcoded -R withastro/astro from verify skill gh commands * add back github explicit trigger * fetch full git history in triage workflow for git blame/log * promote git blame to its own subsection in verify skill * add analyze-github-action-logs skill * update @flue packages and refactor triage workflow for new client API * update @flue/cli and @flue/client packages, post comments via direct API call * fix indentation in proxy config and markdown formatting in comment skill * refactor label operations to use direct API calls instead of gh CLI * add environment guide to AGENTS.md: prefer node over python for scripting * improve AGENTS.md structure and add bgproc dependency Reorganize AGENTS.md with clearer sections for monorepo guide, bgproc, and agent-browser workflows. Add bgproc as a dev dependency for managing long-running dev/preview servers. * add note on list * update @flue/cli to 0.0.40 * merge issue-opened.yml into issue-triage.yml workflow * fix(astro-workflow): guard shell results with exitCode checks before JSON.parse * update @flue/cli to 0.0.41 * update @flue/cli to 0.0.42 and @flue/client to 0.0.26 * update @flue/cli to 0.0.43 and @flue/client to 0.0.27 * split issue-labeled.yml into issue-needsrepro.yml and issue-wontfix.yml * fix: add missing lsof/procps to Docker sandbox and fix bgproc instructions in AGENTS.md * refactor: combine needs-repro workflows into a single file * fix: use FREDKBOT_GITHUB_TOKEN with GITHUB_TOKEN fallback in issue-triage * add knip note about bgproc * cleanup * refactor: extract GitHub API helpers into github.ts, replace flue.shell with direct fetch calls * fix: correct authorAssociation field name in reproduce.md to match camelCase schema * docs: clarify bgproc logs description in AGENTS.md * ci: set PNPM_STORE_DIR to keep pnpm store inside workspace for sandbox access * fix: fall back to GITHUB_TOKEN when FREDKBOT_GITHUB_TOKEN is not set * update AGENTS * ci: inject global OpenCode rules into sandbox container via AGENTS.sandbox.md * refactor: derive branch name from issueNumber and use valibot schema for triage args * ci: add jq to sandbox container system packages * update AGENTS.sandbox.md * refactor: restructure workflows to directory-per-workflow convention * refactor: move sandbox files into .flue/sandbox/ directory * update deps * format, link --------- Co-authored-by: Fred K. Schott <fschott@cloudflare.com>
1 parent 35bc814 commit 8cce897

19 files changed

Lines changed: 487 additions & 198 deletions

File tree

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
name: analyze-github-action-logs
3+
description: Analyze recent GitHub Actions workflow runs to identify patterns, mistakes, and improvements. Use when asked to "analyze workflow logs", "review action runs", or "analyze GitHub Actions".
4+
compatibility: Requires gh CLI and access to the GitHub repository.
5+
---
6+
7+
# Analyze GitHub Action Logs
8+
9+
Fetch and analyze recent GitHub Actions runs for a given workflow. Review agent/step performance, identify wasted effort and mistakes, and produce a report with actionable improvements.
10+
11+
## Input
12+
13+
You need:
14+
15+
- **`workflow`** (required) — The workflow file name or ID (e.g., `issue-triage.yml`, `deploy.yml`).
16+
- **`repo`** (optional) — The GitHub repository in `OWNER/REPO` format. Defaults to `withastro/astro`.
17+
- **`count`** (optional) — Number of recent completed runs to analyze. Defaults to `5`.
18+
19+
## Step 1: List Recent Runs
20+
21+
Fetch the most recent completed runs for the workflow. Filter by `--status=completed`:
22+
23+
```bash
24+
gh run list --workflow=<workflow> -R <repo> --status=completed -L <count>
25+
```
26+
27+
Present the list to orient yourself: run IDs, titles, status (success/failure), and duration. Pick the runs to analyze — prefer a mix of successes and failures if available, and prefer runs that exercised more steps (longer runs tend to go through more stages, while shorter runs may exit early).
28+
29+
## Step 2: Fetch Logs
30+
31+
For each run you want to analyze, save the full log to a temp file:
32+
33+
```bash
34+
gh run view <run_id> -R <repo> --log > /tmp/actions-run-<run_id>.log
35+
```
36+
37+
## Step 3: Identify Step/Skill Boundaries
38+
39+
Search each log file for markers that indicate where each step or skill starts and ends. The markers depend on the workflow — look for patterns like:
40+
41+
- **Flue skill markers**: `[flue] skill("..."): starting` / `completed`
42+
- **GitHub Actions step markers**: Step name headers in the log output
43+
- **Custom markers**: Any `START`/`END` or similar delimiters the workflow uses
44+
45+
```bash
46+
grep -n "skill(\|step\|START\|END\|starting\|completed" /tmp/actions-run-<run_id>.log | head -50
47+
```
48+
49+
From this, determine which line ranges correspond to each step/skill. Also find any result markers:
50+
51+
```bash
52+
grep -n "RESULT_START\|RESULT_END\|extractResult" /tmp/actions-run-<run_id>.log
53+
```
54+
55+
Note: Some log files may contain binary/null bytes. Use `grep -a` if needed.
56+
57+
## Step 4: Analyze Each Step (Use Subagents)
58+
59+
For each step/skill that ran, **launch a subagent** to analyze that section's log. This is critical to avoid polluting your context with thousands of log lines.
60+
61+
For each subagent, provide:
62+
63+
1. The log file path and the line range for that step
64+
2. If skill instruction files exist for the workflow, tell the subagent to read them first for context
65+
3. The run title/context so the subagent understands what was being done
66+
4. The analysis criteria below
67+
68+
### Analysis Criteria
69+
70+
Tell each subagent to evaluate:
71+
72+
1. **Correctness** — Was the step's final result/verdict correct?
73+
2. **Efficiency** — How long did it take? What's a reasonable baseline? Where was time wasted?
74+
3. **Mistakes** — Wrong tool calls, failed commands retried without changes, unnecessary rebuilds, etc.
75+
4. **Instruction compliance** — If skill instructions exist, did the agent follow them? Where did it deviate?
76+
5. **Scope creep** — Did the agent do work that belongs in a different step?
77+
6. **Suggestions** — Specific, actionable changes that would prevent the issues found.
78+
79+
Tell each subagent to return a structured response with: Summary, Time Analysis, Issues Found (with estimated time wasted for each), and Suggestions for Improvement.
80+
81+
## Step 5: Consolidate Report
82+
83+
After all subagents return, synthesize their findings into a single report. Structure it as:
84+
85+
### Per-Run Summary Table
86+
87+
For each run analyzed, include a table:
88+
89+
| Step/Skill | Time | Result | Time Wasted | Top Issue |
90+
| ---------- | ---- | ------ | ----------- | --------- |
91+
92+
### Cross-Cutting Patterns
93+
94+
Identify issues that appeared across multiple runs or multiple steps. These are the highest-value improvements. Common patterns to look for:
95+
96+
- **TodoWrite abuse** — Agent wasting time on task list management during automated runs
97+
- **Server management failures** — Port conflicts, failed process kills, stale log files
98+
- **Tool misuse** — Using `curl` instead of `gh`, `jq` not found, etc.
99+
- **Scope creep** — One step doing work that belongs in another
100+
- **Unnecessary rebuilds** — Building packages multiple times without changes
101+
- **Test timeouts** — Running slow E2E/Playwright tests that time out
102+
- **Instruction violations** — Agent doing something the instructions explicitly forbid
103+
- **Redundant work** — Re-reading files, re-running searches, re-installing dependencies
104+
105+
### Prioritized Recommendations
106+
107+
Rank your improvement suggestions by estimated time savings across all runs. For each recommendation:
108+
109+
1. **What to change** — Which file(s) to edit and what to add/modify
110+
2. **Why** — What pattern it addresses, with evidence from the runs
111+
3. **Estimated impact** — How much time it would save per run
112+
113+
## Output
114+
115+
Present the full consolidated report. Do NOT edit any workflow or skill files — only report findings and recommendations. The user will decide which changes to apply.

.agents/skills/triage/comment.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ Generate a GitHub issue comment from triage findings.
44

55
**CRITICAL: You MUST always read `report.md` and produce a GitHub comment as your final output, regardless of what input files are available. Even if `report.md` is missing or empty, you must still produce a comment. In that case, produce a minimal comment stating that automated triage could not be completed.**
66

7+
**SCOPE: Your job is comment generation only. Finish your work once you've completed this workflow. Do NOT go further than this. It is no longer time to attempt reproduction, diagnosis, or fixing of the issue.**
8+
79
## Prerequisites
810

911
These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
@@ -41,16 +43,16 @@ The **Fix** line in the template has three possible forms. Choose the one that m
4143

4244
The **Priority** line communicates the severity of this issue to maintainers. Its goal is to answer the question: **"How bad is it?"**
4345

44-
Select exactly ONE priority label from the `priorityLabels` arg. Use the label descriptions to guide your decision, combined with the triage report's root cause and impact analysis. Render the chosen label name in square brackets, in bold, formatted with the `- ` prefix removed (Example: `**[P2: Has Workaround].**). Then, follow it with 1-2 sentences explaining **why** you chose that priority. Answer: "who is likely to be affected and under what conditions?". If you are unsure, use your best judgment based on the label descriptions and the triage findings.
46+
Select exactly ONE priority label from the `priorityLabels` arg. Use the label descriptions to guide your decision, combined with the triage report's root cause and impact analysis. Render it in bold, with the `- ` prefix removed, like this: `**Priorty P2: Has Workaround.** Then, follow it with 1-2 sentences explaining _why_ you chose that priority. Answer: "who is likely to be affected and under what conditions?". If you are unsure, use your best judgment based on the label descriptions and the triage findings.
4547

4648
### Template
4749

4850
```markdown
4951
**[I was able to reproduce this issue. / I was unable to reproduce this issue.]** [2-3 sentences describing the root cause, result, and key observations.]
5052

51-
**Fix:** **[See "Fix" Instructions above.]** [1-2 sentences describing the solution, where/when it was already fixed, or guidance on where a fix might be.] [If `branchName` is non-null: [View Suggested Fix](https://github.com/withastro/astro/compare/{branchName}?expand=1)]
53+
**[See "Fix" Instructions above.]** [1-2 sentences describing the solution, where/when it was already fixed, or guidance on where a fix might be.] [If `branchName` is non-null: [View Suggested Fix](https://github.com/withastro/astro/compare/{branchName}?expand=1)]
5254

53-
**Priority:** **[See "Priority" Instructions above.]** [1-2 sentences explaining why this priority was chosen, who is likely to be affected, and under what conditions (this section should answer the question: "how bad is it?")]
55+
**[See "Priority" Instructions above.]** [1-2 sentences explaining why this priority was chosen, who is likely to be affected, and under what conditions (this section should answer the question: "how bad is it?")]
5456

5557
<details>
5658
<summary><em>Full Triage Report</em></summary>

.agents/skills/triage/diagnose.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ Find the root cause of a reproduced bug in the Astro source code.
44

55
**CRITICAL: You MUST always read `report.md` and append to `report.md` before finishing, regardless of outcome. Even if you cannot identify the root cause, hit errors, or the investigation is inconclusive — always update `report.md` with your findings. The orchestrator and downstream skills depend on this file to determine what happened.**
66

7+
**SCOPE: Your job is diagnosis only. Finish your work once you've completed this workflow. Do NOT go further than this (no larger verification of the issue, no fixing of the issue, etc.).**
8+
79
## Prerequisites
810

911
These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.

.agents/skills/triage/reproduce.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ Reproduce a GitHub issue to determine if a bug is valid and reproducible.
44

55
**CRITICAL: You MUST always read `report.md` and write `report.md` to the triage directory before finishing, regardless of outcome. Even if you encounter errors, cannot reproduce the bug, hit unexpected problems, or need to skip — always write `report.md`. The orchestrator and downstream skills depend on this file to determine what happened. If you finish without writing it, the entire pipeline fails silently.**
66

7+
**SCOPE: Your job is reproduction only. Finish your work once you've completed this workflow. Do NOT go further than this (no larger diagnosis of the issue, no fixing of the issue, etc.).**
8+
79
## Prerequisites
810

911
These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
@@ -67,7 +69,7 @@ Skip if the bug is specific to Bun or Deno. Our sandbox only supports Node.js.
6769

6870
### Maintainer Override (`maintainer-override`)
6971

70-
Skip if a repository maintainer has commented that this issue should not be reproduced here. To determine if a commenter is a maintainer, check the `author_association` field on their comment in `issueDetails` — values of `MEMBER`, `COLLABORATOR`, or `OWNER` indicate a maintainer.
72+
Skip if a repository maintainer has commented that this issue should not be reproduced here. To determine if a commenter is a maintainer, check the `authorAssociation` field on their comment in `issueDetails` — values of `MEMBER`, `COLLABORATOR`, or `OWNER` indicate a maintainer.
7173

7274
## Step 3: Set Up Reproduction Project
7375

.agents/skills/triage/verify.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ Verify whether a GitHub issue describes an actual bug or a misunderstanding of i
44

55
**CRITICAL: You MUST always read `report.md` and append to `report.md` before finishing, regardless of outcome. Even if you cannot reach a conclusion — always update `report.md` with your findings. The orchestrator and downstream skills depend on this file to determine what happened.**
66

7+
**SCOPE: Your job is verification only. Finish your work once you've completed this workflow. Do NOT go further than this (no fixing of the issue, etc.).**
8+
79
## Prerequisites
810

911
These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
@@ -46,28 +48,29 @@ Look at the relevant source code in `packages/`. Pay close attention to:
4648
- **Comments explaining "why"** — If a developer left a comment explaining why the code works a certain way, that is strong evidence of intentional design. Treat these comments as authoritative unless they are clearly outdated.
4749
- **Explicit conditionals and early returns** — Code that explicitly checks for the reported scenario and handles it differently than the reporter expects is likely intentional.
4850
- **Named constants and configuration** — Behavior controlled by a named config option or constant was probably a deliberate choice.
49-
- **Git blame on key lines** — If `report.md` identifies specific files and line numbers, run `git blame` on the relevant lines to find the commit that introduced the behavior. Then read the full commit message with `git show --no-patch <commit>` and review the associated PR if referenced. You can fetch PR details with `curl -s "https://api.github.com/repos/withastro/astro/pulls/<number>"`. A commit message or PR description that explains the rationale is strong evidence of intentional design.
5051

51-
### 2c: Search prior GitHub issues and PRs
52+
### 2c: Git blame on key lines
53+
54+
If `report.md` identifies specific files and line numbers, run `git blame` on the relevant lines to find the commit that introduced the behavior. Then read the full commit message with `git show --no-patch <commit>` and review the associated PR if referenced. You can fetch PR details with `gh pr view <number>`. A commit message, PR description, or PR comment from the author explaining the rationale is strong evidence of intentional design.
55+
56+
### 2d: Search prior GitHub issues and PRs
5257

5358
Search for prior issues and PRs that discuss the same behavior using the GitHub API. This can reveal whether the behavior was previously discussed, intentionally introduced, or already reported and closed as "not a bug."
5459

5560
```bash
5661
# Search issues for keywords related to the reported behavior
57-
curl -s "https://api.github.com/search/issues?q=<url-encoded-keywords>+repo:withastro/astro+is:issue&per_page=10"
62+
gh search issues "<keywords>"
5863
# Search PRs that may have introduced or discussed the behavior
59-
curl -s "https://api.github.com/search/issues?q=<url-encoded-keywords>+repo:withastro/astro+is:pr&per_page=10"
64+
gh search prs "<keywords>"
6065
# Read a specific issue for context
61-
curl -s "https://api.github.com/repos/withastro/astro/issues/<number>"
62-
# Read issue comments
63-
curl -s "https://api.github.com/repos/withastro/astro/issues/<number>/comments"
66+
gh issue view <number> --comments
6467
# Read a specific PR for context
65-
curl -s "https://api.github.com/repos/withastro/astro/pulls/<number>"
68+
gh pr view <number> --comments
6669
```
6770

6871
If you find a closed issue where a maintainer explained why the behavior is intentional, or a PR that deliberately introduced it, that is strong evidence of intended behavior.
6972

70-
### 2d: Distinguish bugs from non-bugs
73+
### 2e: Distinguish bugs from non-bugs
7174

7275
This is the most important and most error-prone step. For triage purposes, the definitions are:
7376

.flue/sandbox/AGENTS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Sandbox Environment (CI)
2+
3+
- You are running inside a Docker container in CI.
4+
- Always use `CI=true` with `pnpm install` (no TTY available)
Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ENV DEBIAN_FRONTEND=noninteractive
77
# The slim image includes Node.js and npm but not git, curl, or wget.
88
RUN apt-get update \
99
&& apt-get install -y --no-install-recommends \
10-
ca-certificates curl wget git \
10+
ca-certificates curl wget git jq lsof procps \
1111
&& rm -rf /var/lib/apt/lists/*
1212

1313
# --- pnpm ---
@@ -44,14 +44,29 @@ RUN apt-get update \
4444
&& chmod -R o+rx /opt/pw-browsers \
4545
&& npm uninstall -g playwright
4646

47-
# NOTE: gh CLI is intentionally NOT installed in the sandbox due to lack of tokens.
47+
# --- GitHub CLI (for read-only public repo operations without auth) ---
48+
RUN (type -p wget >/dev/null || (apt-get update && apt-get install wget -y)) \
49+
&& mkdir -p -m 755 /etc/apt/keyrings \
50+
&& out=$(mktemp) && wget -nv -O$out https://cli.github.com/packages/githubcli-archive-keyring.gpg \
51+
&& cat $out | tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
52+
&& chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
53+
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \
54+
| tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
55+
&& apt-get update \
56+
&& apt-get install gh -y \
57+
&& rm -rf /var/lib/apt/lists/*
4858

4959
# --- Compatibility fixes ---
5060
# Allow any directory as a git safe.directory. The host workspace is bind-mounted
5161
# at its original host path (e.g. /home/runner/work/astro/astro) and the container
5262
# runs as a non-root UID via --user, so git would otherwise refuse to operate.
5363
RUN git config --system --add safe.directory '*'
5464

65+
# --- Global OpenCode rules for CI sessions ---
66+
# The flue CLI sets HOME=/tmp at runtime, so OpenCode reads global rules from
67+
# /tmp/.config/opencode/AGENTS.md. This injects CI-specific instructions.
68+
COPY .flue/sandbox/AGENTS.md /tmp/.config/opencode/AGENTS.md
69+
5570
EXPOSE 48765
5671

5772
# Default: start OpenCode server listening on all interfaces

0 commit comments

Comments
 (0)