Skip to content
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
8107571
harness(phase-13): A1 add OTel + Traceloop deps for telemetry foundation
odtorres May 12, 2026
bfdac6a
harness(phase-13): A2 observability package scaffold
odtorres May 12, 2026
2e5fa25
harness(phase-13): A3 wire telemetry callbacks into Nexo graph nodes
odtorres May 12, 2026
64ab72a
harness(phase-13): A5 add OTel + Langfuse settings to HarnessSettings
odtorres May 12, 2026
3c6c92b
harness(phase-13): A4 init OTel + Traceloop in FastAPI lifespan
odtorres May 12, 2026
f3d8998
harness(phase-13): A6 add LangGraph propagation gotcha test
odtorres May 12, 2026
2103776
harness(phase-13): act on Phase A code-review feedback
odtorres May 12, 2026
fe9f036
harness(phase-13): E1 LLM intent router + mode resolver
odtorres May 12, 2026
9f53356
harness(phase-13): E2 meta-question agent
odtorres May 12, 2026
7b14c05
harness(phase-13): E3 composable DB primitives + sqlglot safety gate
odtorres May 12, 2026
684fbd0
harness(phase-13): close E3 safety-gate bypasses found in review
odtorres May 12, 2026
2a928ad
harness(phase-13): E4 provenance log + curate script
odtorres May 12, 2026
bbb51ef
harness(phase-13): E5 conversational memory + ConversationStore seam
odtorres May 12, 2026
6955bba
harness(phase-13): E6 agentic graph topology + E7 tenancy-gate refactor
odtorres May 12, 2026
d404fdf
harness(phase-13): E9 thread `modular.mode` through root + per-agent …
odtorres May 12, 2026
302a61e
harness(phase-13): F1 pytest verification gate satisfied
odtorres May 12, 2026
501a545
harness(phase-13): act on Phase E review + document supervisor wire-u…
odtorres May 12, 2026
a63fd94
harness(phase-13): 1A close Phase E supervisor wire-up
odtorres May 12, 2026
2bdbdef
harness(phase-13): 1B Langfuse + OTel Collector stack scaffolding
odtorres May 12, 2026
a2f5e8a
harness(phase-13): 1C cost report aggregator + CLI scaffold
odtorres May 12, 2026
b3dee40
harness(phase-13): fix Langfuse stack boot — clickhouse healthcheck, …
odtorres May 12, 2026
183cb3f
harness(phase-13): F2/F3 live verification + tests/conftest.py isolat…
odtorres May 12, 2026
416864b
harness(phase-13): F5 done — PR #462 opened
odtorres May 12, 2026
085595f
harness(phase-13): E10 Langfuse first-class trace fields for per-clie…
odtorres May 12, 2026
7208b3d
harness(phase-13): close E3 row-locking bypass in composable primitiv…
odtorres May 14, 2026
c03c380
harness(phase-13): post-review small cleanups — agentic edge + redis …
odtorres May 14, 2026
5330dc8
harness(phase-13): LLMIntentRouter._parse_decision non-dict JSON guard
odtorres May 14, 2026
7274d17
harness(phase-13): fix redis healthcheck false-positive — authenticat…
odtorres May 15, 2026
e2ec897
harness(phase-13): ruff cleanup in nexo composable primitives
odtorres May 15, 2026
93d984a
harness(phase-13): fix nexo_explain row handling for asyncpg.Record
odtorres May 15, 2026
e0ab2a5
harness(phase-13): ruff cleanup in langfuse attribution tests
odtorres May 15, 2026
d1a7b78
harness(phase-13): ruff I001 in test_pricing — drop extra blank line
odtorres May 15, 2026
c9ce1cf
harness(phase-13): ruff E501 in test_intent_router fixture tuples
odtorres May 15, 2026
fbbb9e6
harness(phase-13): drop unused HarnessRoute import in supervisor phas…
odtorres May 15, 2026
2d21523
harness(phase-13): clear remaining CI ruff errors
odtorres May 15, 2026
3542273
harness(phase-13): add minio healthcheck, gate langfuse-web on it
odtorres May 15, 2026
3c59bf0
harness(phase-13): fix mypy on agent_span attributes dict
odtorres May 15, 2026
df62d97
ci(harness): pass GH_TOKEN to attestation verify step
odtorres May 15, 2026
7dc5bcd
ci(harness): register provenance + SBOM with GitHub attestation API
odtorres May 15, 2026
4f454b3
ci(harness): 07-attestations-present — verify SBOM predicate explicitly
odtorres May 15, 2026
d7d4f36
chore: stop tracking .ralph/ session-state scratch files
odtorres May 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .github/workflows/harness.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,48 @@ jobs:
provenance: true
sbom: true

# `docker/build-push-action`'s `provenance: true / sbom: true` embed
# attestations into the OCI manifest only. The deploy-eval script uses
# `gh attestation verify --owner` which queries the GitHub attestation
# API; for that lookup to resolve, the attestation has to be registered
# via the `actions/attest-*` family. We do BOTH (OCI manifest + GitHub
# API) so verification works against either source.
- name: Skip attestation push on fork PRs
id: attest-gate
run: |
if [[ "${{ github.event_name }}" == "pull_request" && "${{ github.event.pull_request.head.repo.full_name }}" != "${{ github.repository }}" ]]; then
echo "should-attest=false" >> "$GITHUB_OUTPUT"
else
echo "should-attest=true" >> "$GITHUB_OUTPUT"
fi

- name: Generate SBOM (SPDX JSON)
if: steps.attest-gate.outputs.should-attest == 'true'
uses: anchore/sbom-action@v0
with:
image: ${{ env.GHCR_REGISTRY }}/${{ env.IMAGE_OWNER }}/${{ env.IMAGE_NAME }}@${{ steps.build-push.outputs.digest }}
format: spdx-json
output-file: sbom.spdx.json
upload-artifact: false
upload-release-assets: false

- name: Attest build provenance (GH attestation API)
if: steps.attest-gate.outputs.should-attest == 'true'
uses: actions/attest-build-provenance@v2
with:
subject-name: ${{ env.GHCR_REGISTRY }}/${{ env.IMAGE_OWNER }}/${{ env.IMAGE_NAME }}
subject-digest: ${{ steps.build-push.outputs.digest }}
push-to-registry: true

- name: Attest SBOM (GH attestation API)
if: steps.attest-gate.outputs.should-attest == 'true'
uses: actions/attest-sbom@v2
with:
subject-name: ${{ env.GHCR_REGISTRY }}/${{ env.IMAGE_OWNER }}/${{ env.IMAGE_NAME }}
subject-digest: ${{ steps.build-push.outputs.digest }}
sbom-path: sbom.spdx.json
push-to-registry: true

- name: Image summary
run: |
{
Expand Down Expand Up @@ -269,6 +311,11 @@ jobs:
run: bash miot-harness/evals/deploy/05-pulls-from-ghcr.sh "$DIGEST"

- name: Verify provenance + SBOM attestations
env:
# `gh attestation verify` calls the GitHub API; without GH_TOKEN it
# exits with "set the GH_TOKEN environment variable" and the script
# surfaces it as a (misleading) FAIL "no attestation verifies".
GH_TOKEN: ${{ github.token }}
run: bash miot-harness/evals/deploy/07-attestations-present.sh "$DIGEST"

- name: Pull from Docker Hub (mirror, non-PR only)
Expand Down
74 changes: 74 additions & 0 deletions .ralph/RALPH_PROMPT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Ralph Loop prompt for Plan 13

Copy-paste the block below into `/ralph-loop:ralph-loop` when you're ready to start. Recommended invocation:

```
/ralph-loop:ralph-loop "<paste the block below>" --max-iterations 70 --completion-promise "PLAN_13_DONE"
```

70 iterations because the checklist has 31 tasks and some need 1–2 retries plus 6 review checkpoints. Adjust after the first session.

---

## The prompt

```
You are implementing plan 13 (.cursor/plans/ai-first/13-post-nexo-roadmap.md).
Source of truth for progress: .ralph/state.md.
Source of truth for current blockers: .ralph/blockers.md.

EACH ITERATION:
1. Invoke skill: superpowers:using-superpowers
2. Read .ralph/blockers.md. If any OPEN blocker prevents the next task in document order, skip to the first task that is not blocked. If ALL remaining tasks are blocked, output "BLOCKER: <one-line>" and STOP.
3. Read .ralph/state.md. Pick the FIRST unchecked [ ] task in document order whose dependencies are met.
4. For the task type:
- Design / contract task (A2 callback shape, E1 ModeResolver contract, E6 agentic_graph topology): invoke superpowers:writing-plans first.
- Code task (most A*, B*, C*, E*): invoke superpowers:test-driven-development. Write a failing test, then minimal code to pass.
- Verification task (D*, F*): invoke superpowers:verification-before-completion. Run real commands, paste output.
- Bug encountered mid-task: invoke superpowers:systematic-debugging.
5. Implement. Keep changes scoped to the single task.
6. Run `uv run pytest` (or the targeted subset). All previously-passing tests MUST still pass.
7. If the task is complete and verified:
- Update .ralph/state.md: change [ ] to [x], bump "Iteration", update timestamp.
- Append to .ralph/log.md: iteration number, task ID, commit SHA, one-line summary.
- Commit on branch feat/harness-phase-13-telemetry-agentic with message "harness(phase-13): <task ID> <short>".
8. If the task cannot be completed (genuine blocker, not a bug): write the question to .ralph/blockers.md as a new OPEN entry with task IDs it blocks, output "BLOCKER: <summary>" and STOP.
9. Check .ralph/state.md: if every box in Phase F is [x], output "PLAN_13_DONE" and STOP. Otherwise continue iterating.

NEVER:
- Skip TDD on code tasks.
- Mark a task [x] without running its verification command and confirming output.
- Modify the plan file (.cursor/plans/ai-first/13-post-nexo-roadmap.md) — that is the spec.
- Modify .cursor/plans/ai-first/{09,10,11,12}-*.md (upstream sources).
- Touch files outside miot-harness/, infra/observability/, .ralph/, scripts/, or the test tree.
- Use --no-verify on commits.
- Echo, log, or commit any secret. .env files stay gitignored.
- Push to trunk. Only commit on feat/harness-phase-13-telemetry-agentic.
- Move past E3 (composable primitives) without the safety tests green — the primitives are the highest-risk surface in this plan.

REVIEW CHECKPOINTS (invoke superpowers:requesting-code-review when these complete):
- After all of Phase A is [x] (telemetry foundation).
- After all of Phase B is [x] (backend deployed).
- After all of Phase D is [x] (telemetry verified end-to-end).
- After E3 is [x] (composable primitives + safety gate — high-risk surface).
- After all of Phase E is [x] (agentic search complete).
- Before opening the PR in F5.

COMPLETION PROMISE: PLAN_13_DONE
BLOCKER FORMAT: BLOCKER: <one-line>
```

---

## How Plan 13 differs from Plan 12

- **Two big tracks in one PR**: telemetry (A–D) + agentic search (E). Telemetry MUST be green before E starts so per-agent cost visibility exists when the agentic surface lights up.
- **Infra changes**: docker-compose stack at `infra/observability/`. Ralph touches this for the first time in plan 13.
- **Higher-risk surface**: composable DB primitives (E3) need an airtight safety gate. The review checkpoint after E3 is non-negotiable.
- **Mode-selection feature**: `RunRequest.mode` lets callers bypass the LLM router. Test the bypass paths explicitly.

## Phase-13 worktree

Path: `.claude/worktrees/harness-phase-13/`
Branch: `feat/harness-phase-13-telemetry-agentic`
Base: `trunk` at SHA `baed70e8` (Merge PR #456) as of worktree creation.
47 changes: 47 additions & 0 deletions .ralph/blockers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Open Blockers

Ralph writes here when it cannot proceed. The user clears entries by editing this file and removing them after the underlying issue is resolved.

---

## RESOLVED 2026-05-12 — Operator preconditions for Phase D + Phase F live verification

Originally OPEN: needed `ANTHROPIC_API_KEY`, SSH tunnel to coordinador-prod, and a Docker daemon for the Langfuse stack.

Cleared via a live bring-up session (commits `b3dee407` and the in-session `infra/observability/` debugging):

- Docker Desktop installed; `docker compose up -d` from `infra/observability/`
brings up all 7 services (postgres / clickhouse / redis / minio / langfuse-web /
langfuse-worker / otel-collector) cleanly. ClickHouse healthcheck switched
to `127.0.0.1` to avoid the macOS IPv6-loopback trap; Langfuse
`ENCRYPTION_KEY` replaced with a real `openssl rand -hex 32` value;
`CLICKHOUSE_CLUSTER_ENABLED=false` added so single-node ClickHouse skips
ZooKeeper; `LANGFUSE_INIT_PROJECT_{PUBLIC,SECRET}_KEY` baked in so the
API keys are deterministic and re-runnable.
- SSH tunnel up via `./bin/tunnel.sh open coordinador-prod` (local
port `6434`, mapping to Citus pgbouncer remote port `6432`).
- `ANTHROPIC_API_KEY` present in `miot-harness/.env`; harness boots
with `nexo.enabled=true` and the full curated tool list registered.

---

## RESOLVED 2026-05-12 — Phase E supervisor wire-up

Originally OPEN: `HarnessSupervisor` did not consume `resolve_mode`,
`LLMIntentRouter`, `agentic_graph`, `meta_agent_node`, or
`InMemoryConversationStore`.

Cleared in commit `a63fd949` (1A). `HarnessSupervisor.__init__` now
accepts the Phase-E modules as optional kwargs; `run()` resolves the
route via `resolve_mode(...)` and dispatches to nexo_graph /
agentic_graph / meta_agent_node / storytelling per the route.
`conversation_id` round-trips via `ConversationStore`. The FastAPI
lifespan in `api/server.py` injects all five Phase-E modules once
the Nexo boot path succeeds; the keyword-router path stays intact
when Nexo is disabled. 7 new tests under
`tests/runtime/test_supervisor_phase_e.py` cover the dispatch tree,
the agentic-non-Mintral refusal, the auto-mode delegation, the
conversation round-trip, and the backward-compat keyword-only
construction.

---
Loading
Loading