Skip to content

Commit 89ecf99

Browse files
committed
docs: document operator UI, PAT auth, and AI rule suggester
Bring the doc set up-to-date with what this PR ships so new devs and operators aren't figuring out a live feature set from source. - README.md: new "Operator UI" section under Monitoring covering enable flags, role mapping (admin/maintain → operator, write/triage/read → writer), per-repo replay authorization, and AI-suggester providers. Enhanced Features list gains an "Operator UI" group. Tools list gains test-llm and test-pem entries. - docs/CONFIG-REFERENCE.md: new "Operator UI" and "AI Rule Suggester (LLM)" env-var tables covering OPERATOR_UI_ENABLED, OPERATOR_AUTH_REPO, OPERATOR_REPO_SLUG, OPERATOR_RELEASE_*, LLM_PROVIDER, LLM_BASE_URL, LLM_MODEL, ANTHROPIC_API_KEY, ANTHROPIC_API_KEY_SECRET_NAME. Calls out the 30/hour/PAT rate limit on /suggest-rule. - docs/DEPLOYMENT.md: Secret Manager step #4 for anthropic-api-key plus the IAM binding; pre-deploy checklist gains the operator-UI auth repo bullet; post-deploy smoke test for the operator UI + AI settings. - docs/LOCAL-TESTING.md: "Optional (for Operator UI + AI rule suggester)" env-var block and a step-by-step "Testing the Operator UI Locally" section that points at cmd/test-llm for provider verification. - docs/FAQ.md: new "Operator UI" section (what it is, who can access, how the AI suggester works, how to debug "not connected"). - AGENT.md: full rewrite. Expanded file map covers all operator_*.go, llm_*.go, web/operator/index.html embed, webhook_trace_buffer, and log_buffer. New sections on authorization model, security posture (auth fail-closed, PAT hashing, SSRF defense-in-depth, LLM cost cap), and edit patterns for operator UI / LLM provider work. Key doc table rebuilt with clickable links.
1 parent 14c9d29 commit 89ecf99

6 files changed

Lines changed: 337 additions & 92 deletions

File tree

AGENT.md

Lines changed: 146 additions & 90 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,13 @@ A GitHub app that automatically copies code examples and files from source repos
2929
- **Development Tools** - Dry-run mode, CLI validation, enhanced logging
3030
- **Thread-Safe** - Concurrent webhook processing with proper state management
3131

32+
### Operator UI
33+
- **Web dashboard at `/operator/`** - Five-tab UI (Overview, Webhooks, Audit, Workflows, System) with dark mode, keyboard shortcuts, and shareable URLs
34+
- **GitHub PAT authentication** - Users sign in with their personal access token; role is derived from their permission on a configured auth repo (`admin`/`maintain` → operator, `write`/`triage`/`read` → writer)
35+
- **Per-repo replay authorization** - Replay requires the caller's PAT to have read access to the source repo of the webhook being replayed
36+
- **Writer-facing tools** - Workflow browser, PR lookup, recent copies feed, file match tester, audit drawer, per-delivery log viewer
37+
- **AI rule suggester** - Paste a source/target pair; get a generated copier rule self-verified against the in-process pattern matcher. Two providers: [Anthropic](https://www.anthropic.com/) (hosted, default in prod via the Grove Foundry APIM gateway) or [Ollama](https://ollama.com) (local, for dev)
38+
3239
## 🚀 Quick Start
3340

3441
### Prerequisites
@@ -385,6 +392,47 @@ Get performance metrics:
385392
curl http://localhost:8080/metrics
386393
```
387394

395+
## Operator UI
396+
397+
The operator UI is a web dashboard served from `/operator/` for diagnosing webhook processing, replaying failed deliveries, browsing workflows, and generating copier rules with AI assistance.
398+
399+
### Enabling the UI
400+
401+
Set the required env vars:
402+
403+
```yaml
404+
OPERATOR_UI_ENABLED: "true"
405+
OPERATOR_AUTH_REPO: "your-org/some-repo" # user permissions here determine role
406+
OPERATOR_REPO_SLUG: "your-org/some-repo" # optional; enables audit-row deep links
407+
```
408+
409+
**Startup fails** if `OPERATOR_UI_ENABLED=true` without `OPERATOR_AUTH_REPO` — this prevents an accidentally-open operator UI.
410+
411+
### Authentication and roles
412+
413+
Each user authenticates with their own **GitHub Personal Access Token**. Paste the PAT into the sign-in prompt; the server checks the user's permission on `OPERATOR_AUTH_REPO` and assigns a role:
414+
415+
| GitHub permission | Operator UI role | Can do |
416+
|---|---|---|
417+
| `admin` / `maintain` | **operator** | View everything; replay deliveries; cut release tags; change AI settings |
418+
| `write` / `triage` / `read` | **writer** | View workflows, audit, recent copies, file match tester, AI rule suggester |
419+
| None | **denied** | 401 Unauthorized |
420+
421+
`write` maps to writer (not operator) so typical docs contributors with repo write access can't replay deliveries or cut releases — those need an explicit `admin` / `maintain` grant.
422+
423+
On top of the role, **replay is repo-scoped**: the user's PAT must also have read access to the source repo of the webhook being replayed.
424+
425+
### AI rule suggester
426+
427+
The operator UI includes an LLM-backed helper that takes a source/target file pair and returns a generated copier workflow rule, self-verified against the in-process pattern matcher before display.
428+
429+
Two providers are supported via `LLM_PROVIDER`:
430+
431+
- **`anthropic`** (default in Cloud Run): calls the Anthropic Messages API. For MongoDB deployments this routes through the Grove Foundry APIM gateway — set `LLM_BASE_URL=https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic` and load the gateway key from Secret Manager via `ANTHROPIC_API_KEY_SECRET_NAME`.
432+
- **`ollama`** (default for local dev): runs against a local Ollama instance at `http://localhost:11434`. Connect, pull models, and switch the active model from the UI's System → AI settings panel without a redeploy.
433+
434+
Smoke-test the LLM provider end-to-end with [`cmd/test-llm`](cmd/test-llm/README.md).
435+
388436
## Audit Logging
389437

390438
When enabled, all operations are logged to MongoDB:
@@ -598,4 +646,6 @@ See [DEPLOYMENT.md](./docs/DEPLOYMENT.md) for the complete deployment and rollba
598646

599647
- **[Config Validator](cmd/config-validator/README.md)** - CLI tool for validating configs
600648
- **[Test Webhook](cmd/test-webhook/README.md)** - CLI tool for testing webhooks
649+
- **[Test PEM](cmd/test-pem/README.md)** - CLI tool for verifying the GitHub App private key
650+
- **[Test LLM](cmd/test-llm/README.md)** - CLI tool for smoke-testing the AI rule suggester's LLM provider
601651
- **[Scripts](scripts/README.md)** - Helper scripts for deployment, testing, and releases

docs/CONFIG-REFERENCE.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ Complete reference for all github-copier configuration options: environment vari
1515
- [Audit Logging](#audit-logging)
1616
- [GitHub API Tuning](#github-api-tuning)
1717
- [Webhook Processing](#webhook-processing)
18+
- [Operator UI](#operator-ui)
19+
- [AI Rule Suggester (LLM)](#ai-rule-suggester-llm)
1820
- [Google Cloud](#google-cloud)
1921
- [Workflow YAML Schema](#workflow-yaml-schema)
2022
- [Main Config](#main-config)
@@ -127,6 +129,32 @@ Set via `.env` files, `env-cloudrun.yaml`, or process environment.
127129
| `WEBHOOK_MAX_RETRIES` | int | `2` | Max retry attempts for failed webhook processing (total attempts = retries + 1). |
128130
| `WEBHOOK_RETRY_INITIAL_DELAY` | int | `5` | Initial delay between retries in **seconds** (doubles each attempt). |
129131

132+
### Operator UI
133+
134+
Mount the web dashboard at `/operator/` (see the [Operator UI section of the README](../README.md#operator-ui) for access model, roles, and feature overview). Off unless `OPERATOR_UI_ENABLED=true`.
135+
136+
| Variable | Type | Default | Description |
137+
|----------|------|---------|-------------|
138+
| `OPERATOR_UI_ENABLED` | bool | `false` | Enable the operator UI routes (`/operator/*`). |
139+
| `OPERATOR_AUTH_REPO` | string || `owner/repo` — the user's permission on this repo determines their role (`admin`/`maintain` → operator, `write`/`triage`/`read` → writer). **Required** when the UI is enabled — startup fails otherwise. |
140+
| `OPERATOR_REPO_SLUG` | string || `owner/repo` used to build clickable GitHub links in audit/trace rows. Optional. |
141+
| `OPERATOR_RELEASE_GITHUB_TOKEN` | string || PAT with `contents:write` used by the UI to create version tag refs. Optional; without it the release button is hidden. |
142+
| `OPERATOR_RELEASE_TARGET_BRANCH` | string | `main` | Branch whose HEAD SHA is tagged when cutting a release from the UI. |
143+
144+
### AI Rule Suggester (LLM)
145+
146+
Powers `/operator/api/suggest-rule`. The feature surface is always available when the operator UI is enabled; connectivity to the configured provider is checked at request time, and operators can switch model / base URL from the UI at runtime (process-global, reverts on restart).
147+
148+
| Variable | Type | Default | Description |
149+
|----------|------|---------|-------------|
150+
| `LLM_PROVIDER` | string | `ollama` | Provider selector: `ollama` (local) or `anthropic` (hosted, default in Cloud Run). |
151+
| `LLM_BASE_URL` | string | per-provider | Provider endpoint. Default `http://localhost:11434` for Ollama or `https://api.anthropic.com` for Anthropic. For MongoDB's Grove Foundry APIM gateway, use `https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic`. |
152+
| `LLM_MODEL` | string | per-provider | Initial active model. Default `qwen2.5-coder:7b` for Ollama or `claude-haiku-4-5` for Anthropic. |
153+
| `ANTHROPIC_API_KEY` | string || Anthropic API / gateway key. Loaded directly from the env for local dev. Ignored when `LLM_PROVIDER=ollama`. |
154+
| `ANTHROPIC_API_KEY_SECRET_NAME` | string || GCP Secret Manager name for the Anthropic key; used in Cloud Run so no key material is ever in env vars or YAML. Short name (e.g. `anthropic-api-key`) is resolved to a full path via `SecretPath()`. |
155+
156+
The suggester is rate-limited to 30 requests/hour per authenticated user (keyed by hashed PAT) to cap provider cost. Denied requests return 429 with a `Retry-After` header.
157+
130158
### Google Cloud
131159

132160
| Variable | Type | Default | Description |

docs/DEPLOYMENT.md

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,22 @@ echo -n "mongodb+srv://user:pass@cluster.mongodb.net/dbname" | \
164164
--replication-policy="automatic"
165165
```
166166

167+
#### 4. Anthropic API Key (Optional - for the AI rule suggester)
168+
169+
Required only when the operator UI is enabled and `LLM_PROVIDER=anthropic` (the default in the committed CI deploy). Skip if you're using Ollama or don't plan to use the AI rule suggester.
170+
171+
```bash
172+
# For the Grove Foundry APIM gateway, the value is the gateway key you were
173+
# issued — not a raw Anthropic sk-... key. The app sends it as both the
174+
# x-api-key (Anthropic) and api-key (APIM) header, so one key works either way.
175+
echo -n "$GATEWAY_KEY" | \
176+
gcloud secrets create anthropic-api-key \
177+
--data-file=- \
178+
--replication-policy="automatic"
179+
```
180+
181+
The env-var that points at this secret is `ANTHROPIC_API_KEY_SECRET_NAME=anthropic-api-key` (already set in `.github/workflows/ci.yml` and `env-cloudrun.yaml`). Missing key is non-fatal — the operator UI shows "not configured" and every other feature still works.
182+
167183
### Grant Cloud Run Access
168184

169185
```bash
@@ -185,6 +201,11 @@ gcloud secrets add-iam-policy-binding webhook-secret \
185201
gcloud secrets add-iam-policy-binding mongo-uri \
186202
--member="serviceAccount:${SERVICE_ACCOUNT}" \
187203
--role="roles/secretmanager.secretAccessor"
204+
205+
# Only if using the AI rule suggester with LLM_PROVIDER=anthropic
206+
gcloud secrets add-iam-policy-binding anthropic-api-key \
207+
--member="serviceAccount:${SERVICE_ACCOUNT}" \
208+
--role="roles/secretmanager.secretAccessor"
188209
```
189210

190211
**Note:** Cloud Run uses the default compute service account by default. You can also create a dedicated service account for better security isolation.
@@ -322,11 +343,12 @@ services.LoadMongoURI(config) // Loads from Secret Manager
322343

323344
### Pre-Deployment Checklist
324345

325-
- [ ] Secrets created in Secret Manager
326-
- [ ] IAM permissions granted to Cloud Run service account
346+
- [ ] Secrets created in Secret Manager (`CODE_COPIER_PEM`, `webhook-secret`, `mongo-uri`, and `anthropic-api-key` if using the AI rule suggester)
347+
- [ ] IAM permissions granted to Cloud Run service account on each secret
327348
- [ ] `env-cloudrun.yaml` created and configured
328349
- [ ] `env-cloudrun.yaml` in `.gitignore`
329350
- [ ] `Dockerfile` exists in project root
351+
- [ ] (Operator UI) `OPERATOR_AUTH_REPO` points at a repo you own and can manage collaborators on — its permission list decides who gets operator vs writer access
330352

331353
### Deploy to Cloud Run
332354

@@ -480,6 +502,16 @@ gcloud run services logs read github-copier --limit=50
480502
# ❌ "webhook signature verification failed"
481503
```
482504

505+
### Smoke-Test the Operator UI (if enabled)
506+
507+
Only applicable when `OPERATOR_UI_ENABLED=true`:
508+
509+
1. Open `https://<service-url>/operator/` in a browser.
510+
2. Generate a GitHub PAT with `repo` scope, paste it into the sign-in prompt.
511+
3. Confirm the user chip in the header shows your GitHub avatar and the correct role (`operator` if you're `admin`/`maintain` on `OPERATOR_AUTH_REPO`, `writer` if you're `write`/`triage`/`read`).
512+
4. Click the **System** tab → **AI settings****Refresh status**. You should see the provider connected (e.g. "Anthropic connected at https://grove-gateway-prod.azure-api.net/…").
513+
5. If AI settings shows "unreachable", the `anthropic-api-key` secret wasn't granted to the Cloud Run service account, or the deploy is pointing at a URL the gateway doesn't accept. Check the Cloud Run revision logs for `Anthropic API key not loaded` or a 401/403 from the gateway.
514+
483515
## Monitoring
484516

485517
### View Logs

docs/FAQ.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,48 @@ The GitHub copier is a GitHub app that automatically copies code examples and fi
3030
- Health and metrics endpoints
3131
- Slack notifications
3232
- Dry-run mode for testing
33+
- **[Operator UI](../README.md#operator-ui)** — Web dashboard at `/operator/` for replay, audit browsing, workflow inspection, and AI-assisted rule generation
34+
35+
## Operator UI
36+
37+
### What is the operator UI?
38+
39+
A web dashboard served from `/operator/` when `OPERATOR_UI_ENABLED=true`. Five tabs:
40+
41+
- **Overview** — live metrics, recent activity, health of dependent services
42+
- **Webhooks** — recent webhook traces with filter/search and one-click replay
43+
- **Audit** — searchable audit event history with a per-event drawer (trace + logs + replay)
44+
- **Workflows** — browse the loaded copier config; test path matches with the built-in file match tester
45+
- **System** — deployment metadata, AI settings, release tagging
46+
47+
### Who can access the operator UI?
48+
49+
Anyone with a GitHub PAT that has access to `OPERATOR_AUTH_REPO`. The user's permission on that repo determines their UI role:
50+
51+
- `admin` / `maintain`**operator**: full access including replay, release, AI settings
52+
- `write` / `triage` / `read`**writer**: view audit, workflows, recent copies, run the AI rule suggester and file match tester, but no replay / release
53+
- No access → 401 Unauthorized
54+
55+
`write` is deliberately mapped to **writer** (not operator) so typical docs contributors can't replay deliveries or cut releases just by having repo write access. Operator capability requires an explicit `admin` / `maintain` grant.
56+
57+
### How does the AI rule suggester work?
58+
59+
Paste a source file path and the target file path you want; optionally name the target repo. The server sends the pair plus a structured prompt to the configured LLM, parses the returned JSON, and runs the generated rule through the in-process pattern matcher to verify it actually produces your target from your source. If it doesn't match, the UI shows a "not verified" warning next to the YAML so you can review before copying it into your config.
60+
61+
Two providers are supported:
62+
63+
- **Anthropic** (default in Cloud Run) — calls the hosted Messages API. In this repo's deploy it routes through the Grove Foundry APIM gateway so no infrastructure needs to be stood up.
64+
- **Ollama** (local dev) — runs against a local model server. The UI can pull models, switch the active one, and delete models without a redeploy.
65+
66+
To cap cost, the suggester is rate-limited to 30 requests/hour per authenticated user.
67+
68+
### The AI settings panel says "not connected" — how do I fix it?
69+
70+
Check the banner at startup — it prints the active `AI Provider`, `AI Model`, and `AI URL`. Then:
71+
72+
- **Anthropic**: make sure `ANTHROPIC_API_KEY` (local) or `ANTHROPIC_API_KEY_SECRET_NAME` (Cloud Run) is set. In Cloud Run, the runtime service account also needs `roles/secretmanager.secretAccessor` on the secret.
73+
- **Ollama**: confirm `ollama serve` is running on the host at `LLM_BASE_URL` (default `http://localhost:11434`) and that you've pulled a model.
74+
- Use [`cmd/test-llm`](../cmd/test-llm/README.md) to exercise the full path outside the UI — it reports Ping, ListModels, and a real GenerateJSON call.
3375

3476
## Configuration
3577

docs/LOCAL-TESTING.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -343,6 +343,43 @@ AUDIT_DATABASE=code_copier_dev
343343
AUDIT_COLLECTION=audit_events
344344
```
345345

346+
### Optional (for Operator UI + AI rule suggester)
347+
348+
```bash
349+
# Mount the operator dashboard at http://localhost:8080/operator/
350+
OPERATOR_UI_ENABLED=true
351+
OPERATOR_AUTH_REPO=your-org/some-repo # your GitHub permission here decides your UI role
352+
OPERATOR_REPO_SLUG=your-org/some-repo # optional; enables clickable audit-row deep links
353+
354+
# AI rule suggester — pick ONE provider:
355+
#
356+
# Option A: Ollama (local, no cloud calls, no API key needed)
357+
# 1. Install Ollama: https://ollama.com/download
358+
# 2. Leave LLM_PROVIDER unset — it defaults to ollama with http://localhost:11434
359+
# 3. From the UI's System → AI settings panel, pull a model (e.g. qwen2.5-coder:7b)
360+
#
361+
# Option B: Anthropic via Grove Foundry APIM gateway
362+
LLM_PROVIDER=anthropic
363+
LLM_BASE_URL=https://grove-gateway-prod.azure-api.net/grove-foundry-prod/anthropic
364+
LLM_MODEL=claude-haiku-4-5
365+
ANTHROPIC_API_KEY=<your-gateway-key> # never commit this; use a local-only env file
366+
```
367+
368+
### Testing the Operator UI Locally
369+
370+
1. Start the app with the env vars above. The startup banner will confirm `Operator UI: true` and show the configured auth repo, AI provider, model, and base URL.
371+
2. Open `http://localhost:8080/operator/` in a browser.
372+
3. Generate a [GitHub Personal Access Token](https://github.com/settings/tokens) with `repo` scope. Paste it into the sign-in prompt. The UI caches it in `localStorage` so you only paste once.
373+
4. If you own `OPERATOR_AUTH_REPO`, grant yourself `admin` for the operator role, or `read`/`write` for the writer role — the header chip will show which one you got.
374+
5. Smoke-test the LLM connection end-to-end with `cmd/test-llm` before hitting the UI:
375+
376+
```bash
377+
go build -o test-llm ./cmd/test-llm
378+
./test-llm -env .env.test
379+
```
380+
381+
A successful run pings the provider, lists models, and issues a real rule-suggester prompt. See [cmd/test-llm/README.md](../cmd/test-llm/README.md) for details.
382+
346383
## Troubleshooting
347384

348385
### Error: "A JSON web token could not be decoded" / "Failed to configure GitHub permissions"

0 commit comments

Comments
 (0)