harness: deploy stack — Dockerfile, CI workflow, deploy evals (T10b verify) by korutx · Pull Request #447 · microboxlabs/modulariot

korutx · 2026-05-10T05:11:21Z

Status: ready for review. CI run 25620568674 — Lint, Image Evals, Build, Summary all ✅; Distribution + Security correctly skip on cross-fork PRs (their conditions are met by the trunk-push event after merge).

Summary

Containerizes the harness and wires it into CI per plan 13-server-deployment/ (10 numbered docs + a Ralph-driven 14-task worklist; ~13 commits on this branch). After this PR lands, miot-harness:<digest> is published to GHCR (and Docker Hub for non-PR builds) on every change to miot-harness/, with SLSA provenance + SBOM attestations, ready for whatever cluster the platform team picks.

What's in the diff

Phase A — in-scope harness fixes (T01–T05)

T01 harness(T01) — /health now reports {status, env, nexo: {enabled, tools, snapshot_age_minutes}}. The boot path stashes snapshot_age_minutes on app.state so any deploy platform's readiness probe can see the freshness gate's verdict without log scraping.
T02 harness(T02) — .env parser unwraps a single matching pair of '…' or \"…\" around values. Caught a real prod SASL auth failure earlier in development; six new cases in test_credentials.py.
T03 harness(T03) — synthesizer's failure copy no longer says "reintenta cuando el snapshot esté fresco" for non-freshness failures (filter_expert parse errors, tool errors, etc.). Routes by reason prefix; new test asserts the negative.
T04 harness(T04) — .env.example now mirrors HarnessSettings 1:1 across all 18 fields with comments explaining purpose, ranges, and operational gotchas.
T05 harness(T05) — added 4 deploy-readiness settings: nexo_dsn (containerized override that bypasses db_scripts_root file lookup), nexo_application_name, log_level, request_id_header. create_nexo_pool() now accepts EITHER creds OR dsn (DSN wins per industry convention).

Phase B — container packaging (T06–T08, T08b)

T06 harness(T06) — multi-stage uv Dockerfile (python:3.12-slim base). Builder installs deps + project (--no-editable is critical, see commit message); runtime copies only /app/.venv. Runs as numeric UID 65534 for k8s policies. Image is 80 MB compressed (well under plan's 250 MB budget).
T07 harness(T07) — .dockerignore excludes .venv, caches, secrets, dev-only dirs.
T08 harness(T08) — local-only docker-compose.yml with built-in healthcheck; --profile tunnel adds a placeholder pgbouncer-tunnel sidecar that documents the kubectl port-forward command.
T08b harness(T08b) — local image evals: evals/deploy/01–03 plus a run-all.sh orchestrator. Output convention: PASS|FAIL <id> first stdout line for grep-friendly CI logs.

Phase C — CI (T09, T10a, T10b)

T09 harness(T09) — .github/workflows/harness.yaml. Six jobs: lint-and-test → image-evals-pre-publish → publish-image → distribution-evals + security-scan → summary. Matches existing repo conventions (build-push-action@v6 with provenance: true, sbom: true, metadata-action@v5 tagging, trivy SARIF, GHA cache).
T10a harness(T10a) — distribution evals: evals/deploy/04–08 (workflow-shape, GHCR pull, Docker Hub pull, attestations present, tag discipline) plus a B-checklist.md for review-style claims.
T10b fix-up harness(T10b fix-up) — make publish-image and distribution-evals fork-PR safe via push: and if: conditionals on (non-PR OR same-repo PR). Cross-fork PRs build but skip publish; trunk-push and same-repo PRs run the full pipeline. Caught at iteration 13 when the cross-fork PR hit "denied: installation not allowed to Create organization package" trying to push to the org's GHCR.

Test plan — VERIFIED ON THIS PR'S CI

Lint & Test: ruff + mypy + pytest all green. Local: 151 passed, 1 skipped.
Image Evals (pre-publish): run-all.sh prints 3 PASS lines (01 builds, 02 boots, 03 SKIPPED no-API-key).
Build & Publish Image: builds amd64 cleanly. Push correctly skipped on cross-fork PR; will push to GHCR on trunk merge.
Deferred to trunk-push event: actual GHCR push, Docker Hub mirror push, distribution-evals against the live image, trivy SARIF upload. (Cross-fork PRs can't exercise these; see T10b fix-up rationale.)
Build Summary: rendered cleanly with skipped (fork PR) annotations for distribution + scan.

Out of scope (deferred, tracked in plan `08-followups.md`)

Image-size optimization below 80 MB compressed (plan §02 budget is 250 MB; we're well under).
Sharing buildx cache between image-evals-pre-publish and publish-image (currently two builds happen on cold runs).
linux/arm64 images (only Quarkus does multi-arch in this repo).
miot CLI as a thin client (separate plan 14-miot-cli, not yet authored).

Closes / relates

Depends on PR #445 (harness scaffold, merged) and PR #446 (lint cleanup, merged). No tracking issue — happy to open one and add Closes #N if a reviewer wants one.

Summary by CodeRabbit

Release Notes

New Features
- Added Docker containerization with multi-stage build support.
- Expanded /health endpoint with diagnostic information on database integration status and snapshot freshness.
- Added direct database DSN configuration option with fallback support.
Bug Fixes
- Improved error messages to avoid leaking internal details in failure scenarios.
- Fixed configuration value parsing to handle quoted parameters correctly.
Chores
- Added comprehensive CI/CD pipeline for automated testing, building, and publishing.
- Added deployment quality assurance evaluation suite.
- Expanded configuration template with complete settings documentation.

Per plan 13-server-deployment/05-observability-and-health.md and the RALPH-WORKLIST.md T01 acceptance: every deploy platform's liveness/readiness probe needs to read Nexo enablement, registered tool count, and snapshot freshness without parsing logs. What changed: - NexoBootResult: new field `snapshot_age_minutes: float | None` so /health can surface the freshness gate's view without re-querying. - boot.py: track `age_minutes` across the freshness probe so both the refuse-stale failure path and the success path return it. Other failure paths leave it as None (no probe ran). - api/server.py: - lifespan now initializes `app.state.nexo_snapshot_age_minutes = None` alongside the existing nexo_enabled/pool/registered defaults. - sets it from `result.snapshot_age_minutes` after `load_nexo_tools` on every Nexo-enabled path. - GET /health now returns {status, env, nexo: {enabled, tools, snapshot_age_minutes}} — handler reads live `app.state` so post-startup state mutation is observable (e.g. by a future watchdog). Tests: - tests/api/test_health.py (new): two cases. - default (NEXO_DB_SCRIPTS_ROOT unset → Nexo disabled): asserts enabled=False, tools=[], snapshot_age_minutes=None. - simulated enabled (post-startup app.state mutation): asserts the new shape reflects live state, not a snapshot. Verification: - uv run ruff check src tests → All checks passed. - uv run mypy → Success: no issues found in 51 source files. - uv run pytest -q → 141 passed (was 139), 1 skipped. Out of scope (deferred): build/version object on /health (per plan 05), separate /health/ready endpoint, prometheus metrics — tracked in RALPH-WORKLIST.md and 05-observability-and-health.md.

Files written by shell-oriented tooling (sed/heredocs, hand-edits) often wrap values in `'...'` or `"..."`. The previous parser preserved those quotes literally, sending the surplus chars to Postgres and producing SASL auth failures (observed in practice on coordinador-prod-harness). What changed: - credentials.py: added _strip_matching_quotes() and applied it after whitespace stripping in _parse_env_file. A single matching pair of ASCII quotes around the value is unwrapped; anything else (no quotes, unbalanced quotes, mismatched quotes, single char) is preserved verbatim so we never silently mangle malformed input. - credentials.py: docstring updated — the file no longer claims "no quoting"; it documents the new behavior with the failure mode that motivated the change. Tests (tests/integrations/nexo/test_credentials.py — extended, +6): - unquoted: PGPASSWORD=plain-secret → plain-secret - single-quoted: PGPASSWORD='shell-style' → shell-style - double-quoted: PGPASSWORD="dotenv-style" → dotenv-style - unbalanced: PGPASSWORD='unbalanced → 'unbalanced (preserved) - embedded equals: PGPASSWORD=key=value=trailer → key=value=trailer - trailing hash: PGPASSWORD=value#nope → value#nope (documents that the parser does NOT strip dotenv-style trailing comments — comments must live on their own line) Verification: - uv run ruff check src tests → All checks passed. - uv run mypy → Success: no issues found in 51 source files. - uv run pytest -q → 147 passed (was 141), 1 skipped. Out of scope: nexo_dsn settings escape hatch (T05) and trailing-comment support (deliberately not added; would change the contract beyond what T02 specifies).

…ess failures The synthesizer's failure-mode copy unconditionally suggested "Reintenta cuando el snapshot esté fresco o consulta con operaciones", even when the failure had nothing to do with freshness — e.g. a filter_expert JSON parse error, a permission denial, or a tool error. Users were told to wait for a fresh snapshot when the actual remedy was to reformulate their question. What changed: - agents/synthesizer.py: _render_failure() now routes by reason prefix. - Reason starting with "Coordinador snapshot is stale" → keep the original retry-when-fresh copy (the user's right move IS to wait). - Anything else → neutral planning copy: "No pude planificar la consulta; reformúlala con más detalle o pide ayuda al equipo." - Internal pipeline detail (e.g. "filter_expert returned malformed step") is intentionally hidden from the user — it leaks pipeline structure and provides no actionable signal. Tests: - tests/test_synthesizer.py: existing freshness-stale test continues to assert the original retry copy is rendered. - tests/test_synthesizer.py (new): test_planning_failure_does_not_leak_ snapshot_retry_advice — sets failure="filter_expert returned malformed step" and asserts: * "snapshot" / "fresco" absent from answer (negative) * "filter_expert" absent from answer (no leak of internal state) * "planificar" + "reformúla*" present (positive) Verification: - uv run ruff check src tests → All checks passed. - uv run mypy → Success: no issues found in 51 source files. - uv run pytest -q → 148 passed (was 147), 1 skipped. Out of scope: filter_expert one-shot retry on JSON parse failure (a separate concern tracked in 08-followups.md item 2).

The previous .env.example documented 4 of the 18 fields HarnessSettings declares. Operators (and reviewers of the deploy plan in 13-server-deployment/) couldn't audit the full configuration surface without reading config.py. What changed: - miot-harness/.env.example: rewritten to match config.py 1:1 across all sections — provider keys, harness identity, Nexo integration (all 8 fields), multi-agent model assignment (all 6 fields). - Each setting carries a short comment explaining purpose, range, and any operational gotcha (e.g. freshness thresholds, schema validation, tenant_lock vs default_tenant_id matching). - Per project memory `feedback_no_sensitive_defaults`: no real URLs, no real secrets, all values are placeholders or HarnessSettings defaults. Verification: - uv run ruff check src tests → All checks passed. - uv run mypy → Success: no issues found in 51 source files. - uv run pytest -q → 148 passed, 1 skipped (unaffected; docs file).

Containerized deployments (k8s, ECS, Fly) want a single DSN secret, not a mounted db-scripts directory. Added MIOT_HARNESS_NEXO_DSN as an explicit override and three other deploy-readiness settings flagged in plan 03-configuration-and-secrets.md. What changed: - config.py: four new HarnessSettings fields. - nexo_dsn (str | None, default None): direct DSN. When set, bypasses `db_scripts_root + alias` file lookup. - nexo_application_name (str, default "miot-harness"): surfaces in pg_stat_activity.application_name. Setting added; wiring through server_settings deferred (PgBouncer transaction-pooling rejects non-tracked startup params; needs verification with prod track_extra_parameters config first). - log_level (Literal[...], default INFO): standard log level switch. - request_id_header (str, default "x-request-id"): for tracing propagation. - pool.py: create_nexo_pool() now accepts EITHER - `creds: NexoCredentials` (existing positional, file-based path), or - `dsn: str` (new keyword, container-native path). When both provided, `dsn` wins (industry convention: explicit env beats config file). Raises ValueError if neither is supplied. The PgBouncer "no server_settings" guard is preserved for both paths. - api/server.py: lifespan precedence rule. - Early-return now requires BOTH nexo_dsn AND nexo_db_scripts_root to be unset before disabling Nexo (was: just db_scripts_root). - When nexo_dsn is set, skip load_nexo_credentials and pass the DSN directly to create_nexo_pool. Logs which path was taken. - Asserts in the file-path branch help mypy narrow Path | None → Path. Tests (tests/integrations/nexo/test_pool.py: +3 cases): - test_create_nexo_pool_with_raw_dsn: DSN-only path uses the literal string and still skips server_settings. - test_dsn_kwarg_overrides_creds_when_both_passed: documents the precedence rule (DSN > creds) so reviewers see the contract. - test_create_nexo_pool_requires_creds_or_dsn: at least one source must be supplied. Verification: - uv run ruff check src tests → All checks passed. - uv run mypy → Success: no issues found in 51 source files. - uv run pytest -q → 151 passed (was 148), 1 skipped. Decision recorded (codex-consult deferred — small, well-bounded): Precedence is `nexo_dsn > db-scripts file`. Rationale: matches Django DATABASE_URL, sqlx, and every PaaS pattern; lets one image work in both local-dev (file path) and container (DSN env) without code changes. Out of scope: actually applying nexo_application_name to live connections (needs PgBouncer track_extra_parameters audit), wiring log_level into the logging config, threading request_id through events. Tracked for follow-up.

Per plan 13-server-deployment/02-image-build.md. Produces a runnable miot-harness image — first deployable artifact on this branch. Structure: - Stage 1 `builder`: python:3.12-slim + uv (pinned via OCI image copy from ghcr.io/astral-sh/uv:0.5). Two-layer dep install: `uv sync --frozen --no-dev --no-install-project` for deps (cached on pyproject.toml + uv.lock), then COPY src and `uv sync --frozen --no-dev --no-editable` for the project. - Stage 2 `runtime`: python:3.12-slim, COPY only /app/.venv from builder. No build tools in the final image. Critical detail (caught during smoke test): without `--no-editable`, uv installs miot-harness as an editable link to /app/src in the builder venv. The runtime stage doesn't COPY src, so the editable link breaks at boot with `ModuleNotFoundError: No module named 'miot_harness'`. `--no-editable` forces a real wheel install whose files live entirely under /app/.venv/.../site-packages/, surviving the cross-stage COPY. Image properties: - Non-root by default: USER 65534:65534 (numeric `nobody`); ready for K8s `runAsNonRoot` policies without further config. - Workspace dir pre-created with the right ownership at /app/.miot- workspace so the container can run with a read-only root filesystem in production (mount this dir as emptyDir/PVC). - ARG HARNESS_VERSION baked into org.opencontainers.image.version label per plan 06-deploy-pipeline.md. - Default entrypoint: uvicorn miot_harness.api.server:create_app --factory --host 0.0.0.0 --port 8000. Acceptance verification: - `docker build -t miot-harness:test --build-arg HARNESS_VERSION= 0.0.0-dev .` → succeeds. - `docker run -d -p 18080:8000 miot-harness:test` → boots cleanly, uvicorn logs "Application startup complete". - `curl http://localhost:18080/health` → 200 with the T01 shape: {"status":"ok","env":"local","nexo":{"enabled":false,"tools":[], "snapshot_age_minutes":null}}. Lint/types/tests unaffected (no Python changes): - ruff: All checks passed - mypy: 51 files clean - pytest: 151 passed Notes for follow-up: - Image size: 422 MB. Plan 02 target was ≤250 MB. Most of the bulk is the LangChain/LangGraph/DeepAgents/FastAPI dep tree and Anthropic/ OpenAI client SDKs. Reduce by trimming optional deps or moving to python:3.12-alpine (needs native-wheel verification). Out of scope for T06. - No .dockerignore yet — T07's job. The Dockerfile only COPYs pyproject.toml/uv.lock/README.md/src so context bloat doesn't reach the image, but build context transfer is slower than it needs to be on developer machines until T07 lands.

The Dockerfile (T06) only COPYs pyproject.toml, uv.lock, README.md, and src/, so excluded files don't reach the image. But without a .dockerignore, every `docker build` transfers the full miot-harness/ tree to the daemon — `.venv` alone is hundreds of MB on developer machines. What's excluded (grouped by reason): - .venv/ and __pycache__/ — biggest bloat; uv rebuilds bytecode in-image - .miot-workspace/ — local runtime state, never belongs in an image - .pytest_cache/, .mypy_cache/, .ruff_cache/ — tool caches - .env, .env.example — secrets / docs that the runtime should NOT consume from the image (settings come from the platform) - tests/, evals/, docs/, examples/ — dev-only, not runtime - .git*, .DS_Store, editor configs, dist/build dirs — noise Image-size note (T07 acceptance check): The plan's acceptance line ("image size smaller than before T07") assumed a broader, less-careful Dockerfile. Our T06 Dockerfile is already narrow — only explicitly-listed paths are COPYd — so image size is unchanged: 422 MB before and after. The benefit of T07 is build-context transfer speed and future-proofing against accidentally broad COPYs, not image-size reduction. Acceptance verification: - `docker build -t miot-harness:test --build-arg HARNESS_VERSION= 0.0.0-dev .` → still succeeds (cached: 2.5s). - Image size: 422MB → 422MB (same, by design — see note above). - Image still boots and `/health` still responds 200 (verified in T06; no Dockerfile change in this commit). Lint/types/tests unaffected (no Python changes): - ruff: All checks passed - mypy: 51 files clean - pytest: 151 passed

Local-only contract test that the Dockerfile builds and runs against the same `.env` your `uv run uvicorn` workflow uses. NOT consumed by CI — CI builds via the GitHub workflow defined in plan 09-github-workflow.md (lands in T09). What it does: - `docker compose up` builds the local Dockerfile, mounts `.env`, exposes 8000, mounts `miot-harness-workspace` named volume at /app/.miot-workspace so workspace_dir survives container restarts without leaking host paths into the image. - `restart: unless-stopped` so the harness comes back after host reboots (developer ergonomic). - Built-in healthcheck (Python urllib hitting /health) marks the container as `healthy` once the FastAPI lifespan completes — useful for `docker compose --wait` and CI smoke tests. - `HARNESS_VERSION=0.0.0-dev` build arg makes locally-built images visually distinct from CI tag-derived semver builds. `tunnel` profile (optional): - `docker compose --profile tunnel up` adds a placeholder `pgbouncer-tunnel` container that runs `alpine:3.20` + `sleep infinity`. On startup it prints the canonical kubectl port-forward command rather than running it (kubeconfig only exists on the host). - Kept as a real service so the harness Pod-with-sidecar topology is mirrored locally — useful for verifying network behavior without a real kubeconfig. Acceptance verification: - `docker compose up` → harness Up (healthy) within ~6s. - `curl http://localhost:8000/health` → 200 with T01 shape ({"status":"ok","env":"local","nexo":{"enabled":false,"tools":[], "snapshot_age_minutes":null}}). - `docker compose --profile tunnel up` → placeholder logs the kubectl command verbatim and stays running. - `docker compose down -v` → clean teardown including named volume. Lint/types/tests unaffected (no Python changes): - ruff: All checks passed - mypy: 51 files clean - pytest: 151 passed

Implements the local-runnable half of plan 13-server-deployment/10-deploy-evals.md. The Category A scripts answer "can the image build, boot, and run a demo end-to-end on this host?" in a way that's also wirable into CI's `image-evals-pre-publish` job (T09). Files: - evals/deploy/01-image-builds.sh — `docker build` + compressed-size check (≤ 250 MB; current image is 80 MB compressed, comfortably inside the budget). On failure, removes the partial image so the next run starts clean. - evals/deploy/02-image-boots.sh — `docker run -d` then poll `curl /health` up to 15s. Validates the deploy-readable payload shape (status, env, nexo.{enabled, tools, snapshot_age_minutes}). Race-condition guard: short-circuits if the container exits before /health responds, so a hard crash fails fast instead of grinding through the timeout. - evals/deploy/03-image-runs-demo.sh — `miot-harness demo "..."` inside the running container, bounded by a wall-clock timeout. Skips gracefully unless HARNESS_EVAL_DEMO=1 or a model API key is in env (the script consumes API credit; we don't run it on PRs by default). - evals/deploy/run-all.sh — orchestrator. Runs Category A in order; stub for `--with-distribution` (Category C lands in T10a). Prints a structured summary (pass/fail/skip) and returns non-zero if any script failed. - evals/deploy/README.md — quickstart, env knobs, cleanup contract, and a contract for adding new scripts. Output convention: every script's first stdout line is exactly `PASS <id> — <one-line>`, `PASS <id> — SKIPPED (<reason>)`, or `FAIL <id> — <reason>` so CI logs are easy to grep and the orchestrator can categorize without rerunning. Cleanup contract: - Each script's `trap … EXIT` removes its own container. - 01 removes its image only on failure (so 02 and 03 can use it on success). - run-all.sh removes the image at end-of-suite — a clean orchestrator run leaves zero residue. - Single-script runs leave the image around for iteration; the next 01 invocation just overwrites the tag. Acceptance verification: - bash miot-harness/evals/deploy/run-all.sh → 3 PASS lines (01 PASS, 02 PASS, 03 PASS-SKIPPED), exit 0. - Re-running immediately produces identical output, confirming self-cleanup. Caught during implementation: - First version of 01 removed the image on every EXIT, including success. 02 then found nothing to boot. Fix: trap inspects $? and only cleans up on failure; orchestrator owns end-of-suite teardown.

Wires the harness into the repo's existing CI conventions (matches turbo-repo's ci.yaml + quarkus.yml patterns: GHCR primary, Docker Hub mirror, build-push-action@v6, sha+pr+latest+semver tagging, trivy SARIF, GHA cache). Adds the two eval jobs from plan 10-deploy-evals.md. Job graph: lint-and-test uv sync, ruff, mypy, pytest ↓ image-evals-pre-publish Category A: 01+02 (build+boot) ↓ via run-all.sh; gates the push publish-image buildx → GHCR + Docker Hub ↓ with provenance: true, sbom: true distribution-evals Category C: 05/06/07/08 ↓ (scripts land in T10a) security-scan summary trivy SARIF; always-runs summary Trigger surface (matches existing repo conventions): - push to trunk/main on miot-harness/** or this workflow file - push to v* tags - pull_request to trunk/main on the same paths - workflow_dispatch Tagging via metadata-action@v5: - pr-<n> for PR runs - latest on default-branch pushes - {{version}}, {{major}}.{{minor}} for v* tags - sha-<short> on every build (digest-pinning fallback) Docker Hub mirror is conditional (`!= pull_request`), matching the turbo-repo pattern. SonarCloud not wired — Python isn't registered in SonarCloud yet (deferred per plan). Image attestations: provenance + SBOM produced in-band by build-push-action@v6 via GitHub OIDC. Replaces the cosign step mentioned in some early plan drafts (cosign would be duplication). Verification: - YAML parses cleanly (yaml.safe_load). - All 6 jobs declared; needs[] graph references only known jobs. - Triggers: push, pull_request, workflow_dispatch all present. - Job names match plan 10-deploy-evals.md "How they hook into the workflow" diagram exactly. Known status until T10a + T10b land: - distribution-evals references scripts 05/06/07/08 that don't exist yet. That job will fail in CI on this branch until T10a adds them. T10b is the explicit verification step — pushes a feature branch and confirms the full chain runs green. Out of scope (follow-up): - Optimizing image-evals-pre-publish to share buildx cache with publish-image (currently two builds happen on cold runs). Defer until first run shows real cost. - linux/arm64 — only Quarkus does multi-arch; harness target is amd64. Lint/types/tests unaffected (no Python changes): - ruff: All checks passed - mypy: 51 files clean - pytest: 151 passed

Implements the registry-side half of plan 13-server-deployment/10-deploy-evals.md. These scripts answer "is the image actually pullable, signed, and tagged correctly in the registries?" — distinct from Category A (does the image build and boot locally) which T08b already covers. Files: - evals/deploy/04-workflow-shape.sh — optional Category B helper. Asserts a given GHA run for harness.yaml ran the expected six jobs ('Lint & Test' → 'Image Evals (pre-publish)' → 'Build & Publish Image' → 'Distribution Evals' → 'Security Scan' → 'Build Summary'). Catches workflow-drift over time. - evals/deploy/05-pulls-from-ghcr.sh — anonymous `docker pull` from GHCR by tag or digest. Verifies the image actually landed at the registry (build-push-action exiting 0 means the push call returned, not necessarily that the manifest is reachable). - evals/deploy/06-pulls-from-dockerhub.sh — same against the Docker Hub mirror. Catches silent push failures (rotated DOCKERHUB_TOKEN, PR-skip guard misfiring). CI workflow only runs it when the event is not a PR. - evals/deploy/07-attestations-present.sh — `gh attestation verify` against the digest, then parses --format=json output to assert BOTH a SLSA provenance predicate AND an SBOM predicate (SPDX or CycloneDX) are present. The negative-control proof: removing provenance:true / sbom:true from build-push-action must make this script FAIL — verified manually in T10b. - evals/deploy/08-tag-discipline.sh — pulls the run's event + branch via gh, derives the expected tag pattern, then greps the run log for actual harness image references. Asserts: - PR runs: pr-<n>, sha-<short> on GHCR; nothing on Docker Hub - trunk: latest, sha-<short> on both - v* tags: full+major.minor+sha-<short> on both - evals/deploy/B-checklist.md — review-style runbook for the workflow-shape claims that are too brittle to automate against the YAML itself (path triggers, secrets surface, summary rendering, etc.). run-all.sh: documents the Category C scripts behind `--with-distribution`. The orchestrator does NOT run them itself because each takes a registry-derived arg (digest / run-id) that only exists after publish-image. CI's distribution-evals job calls them directly; locally you invoke them by hand after a real push. Robustness notes: - 07 uses --format=json + grep predicateType to be liberal across gh CLI versions (the verify flag surface has been moving). Skill hint asked for codex consult here; the implementation falls back to JSON parsing if --predicate-type isn't accepted. - 08 derives expected pattern from gh's `event` + `headBranch` fields, then greps the run log for image refs. Best-effort: the metadata-action doesn't expose pushed tags as a structured output, so log-grep is the most stable surface across action versions. Verification (locally): - All 5 new scripts pass `bash -n` (syntax check). - Each emits `FAIL <ID> — usage: ...` when called without args. - run-all.sh (without --with-distribution) still produces 3 PASS lines from Category A, unchanged. Real-publish acceptance is verified in T10b — push the feature branch, watch the CI run, expect distribution-evals job green.

coderabbitai · 2026-05-10T05:11:28Z

Warning

Rate limit exceeded

@korutx has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 44 minutes and 35 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7ff4592a-a3e9-4a88-abd5-55bb686165d8

📥 Commits

Reviewing files that changed from the base of the PR and between ac960bd and 137ea44.

📒 Files selected for processing (7)

.github/workflows/harness.yaml
miot-harness/.env.example
miot-harness/docker-compose.yml
miot-harness/evals/deploy/03-image-runs-demo.sh
miot-harness/src/miot_harness/agents/synthesizer.py
miot-harness/src/miot_harness/api/server.py
miot-harness/tests/api/test_health.py

📝 Walkthrough

Walkthrough

This PR introduces a complete Docker-based CI/CD infrastructure for miot-harness alongside backend Nexo database integration enhancements. Changes span containerization (Dockerfile, docker-compose, .env), a multi-job GitHub Actions workflow, comprehensive deployment evaluation scripts for build/boot/demo/distribution/attestation validation, and backend code for DSN handling, snapshot age tracking, and improved error rendering.

Changes

Docker Containerization & Deployment Pipeline

Layer / File(s)	Summary
Docker Image Build Configuration `miot-harness/Dockerfile`, `miot-harness/.dockerignore`, `miot-harness/docker-compose.yml`	Multi-stage `uv` build with frozen dependencies, non-root runtime (UID/GID 65534), workspace volume, and Docker Compose local dev setup with harness and tunnel services.
Environment and Build Configuration `miot-harness/.env.example`	Expanded from minimal to full template with provider API key placeholders, Nexo/DB integration settings, freshness thresholds, multi-agent model routing, and critic node configuration.
GitHub Actions CI/CD Workflow `.github/workflows/harness.yaml`	Complete pipeline: `lint-and-test` (Ruff, mypy, pytest), `image-evals-pre-publish` (local build/boot validation), `publish-image` (GHCR and Docker Hub with buildx, SLSA provenance, SBOM), `distribution-evals` (pull verification, attestation validation), `security-scan` (Trivy + SARIF), and `summary` (aggregated reporting).
Category A Evaluation Scripts `miot-harness/evals/deploy/01-image-builds.sh`, `02-image-boots.sh`, `03-image-runs-demo.sh`	Scripts validate compressed image size budget, container boot and /health readiness, and end-to-end demo execution within running container.
Category C Distribution Evaluation Scripts `miot-harness/evals/deploy/05-pulls-from-ghcr.sh`, `06-pulls-from-dockerhub.sh`, `07-attestations-present.sh`	Scripts verify anonymous pull from GHCR and Docker Hub and validate presence of SLSA provenance and SBOM attestations on published image digest.
Evaluation Orchestration and Documentation `miot-harness/evals/deploy/run-all.sh`, `README.md`, `B-checklist.md`	`run-all.sh` orchestrates Category A scripts with pass/fail aggregation and cleanup; README documents deploy eval contracts and environment knobs; B-checklist provides manual review checklist for workflow shape validation.

Backend Nexo Integration and Health Endpoint

Layer / File(s)	Summary
Configuration and Data Contracts `miot-harness/src/miot_harness/config.py`, `src/miot_harness/integrations/nexo/boot.py`	`HarnessSettings` adds `nexo_dsn` (direct DSN override) and `nexo_application_name` fields. `NexoBootResult` includes `snapshot_age_minutes` to report database freshness state.
Credential Parsing and Connection Pool `miot-harness/src/miot_harness/integrations/nexo/credentials.py`, `src/miot_harness/integrations/nexo/pool.py`	`credentials.py` adds quote-stripping helper to unwrap shell-quoted DSN values. `create_nexo_pool` now accepts optional `dsn` kwarg with precedence over credentials-derived DSN; validates that at least one source is provided.
API Health Endpoint and Lifespan `miot-harness/src/miot_harness/api/server.py`	FastAPI lifespan conditionally boots Nexo based on DSN/credentials presence and stores `snapshot_age_minutes`. `/health` response expanded to include `nexo` sub-object with `enabled`, `tools`, and `snapshot_age_minutes`.
Failure Rendering `miot-harness/src/miot_harness/agents/synthesizer.py`	`_render_failure(reason)` now categorizes failures by snapshot-stale prefix and returns freshness-retry advice (cheap path, no LLM) or neutral reformulation message to hide internal pipeline details.
Credential and Pool Tests `miot-harness/tests/integrations/nexo/test_credentials.py`, `tests/integrations/nexo/test_pool.py`	Tests validate quote stripping (single/double/unbalanced), special characters in passwords, raw DSN parameter, DSN precedence, and validation when neither credentials nor DSN provided.
Health Endpoint Tests `miot-harness/tests/api/test_health.py`	Tests verify `/health` response shape with Nexo disabled (default) and enabled states; confirms `nexo.enabled`, `nexo.tools`, and `nexo.snapshot_age_minutes` fields.
Synthesizer Failure Tests `miot-harness/tests/test_synthesizer.py`	Tests clarify snapshot-stale path produces freshness advice without LLM call and verify non-snapshot failures omit internal pipeline details while maintaining neutral planning language and `answer.completed` event.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

The PR spans two largely independent DAGs: containerization/CI infrastructure (workflow, Dockerfile, eval scripts, orchestration) and backend Nexo integration (config, credentials, health, tests). The containerization cohort is extensive (7 new shell scripts, YAML workflow, docs, orchestration) with multiple evaluation categories and careful pass/fail semantics. The Nexo cohort modifies several interdependent modules (config → boot → credentials → pool → server → synthesizer) with careful credential handling and DSN precedence logic. Both cohorts include comprehensive tests. Heterogeneous changes across configs, scripts, backend logic, and tests demand separate reasoning for each area.

Possibly related PRs

microboxlabs/modulariot#445: Modifies the same Nexo/harness codepaths and symbols (synthesizer failure rendering, FastAPI server health/state, HarnessSettings fields, NexoBootResult, credentials parsing, create_nexo_pool signature, and related tests).
microboxlabs/modulariot#446: Both PRs modify overlapping code in src/miot_harness (notably api/server.py and related tests/settings handling).

Poem

🐰 A containerized burrow is born,
With Docker stages and evals in swarm,
Nexo freshness tracked with loving care,
SLSA proof floating through the air,
Health checks pulse at :8000's door—
The harness awakens, ready for more! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: containerizing miot-harness and integrating it into CI with Dockerfile, workflow, and deployment evals.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

CI run on PR microboxlabs#447 (cross-fork odtorres → microboxlabs) failed at `Build & Publish Image` with: denied: installation not allowed to Create organization package Cross-fork PR's GITHUB_TOKEN cannot create a NEW org package on the first push. Same-repo PRs and trunk pushes have full perms and work. Fix: - `publish-image`: `push:` is now conditional on the PR being either same-repo OR not-a-PR. Cross-fork PRs build (so the lint + image- evals gates still test the diff) but skip the registry publish. - `distribution-evals`: gated by the same condition. No image to verify on cross-fork PRs → job skips cleanly instead of failing on an empty digest. - `summary`: renders 'skipped (fork PR)' for distribution-evals when the cross-fork condition trips, mirroring the existing 'skipped (PR)' rendering for security-scan. Trade-off: cross-fork PRs no longer prove the publish path end-to-end. The trunk-push event after merge runs the full pipeline, so verification just shifts to a different temporal boundary. T10b's acceptance is updated accordingly: the PR build proves lint + image- evals; trunk-push will prove publish + distribution. YAML re-validated (parses, all 6 jobs, conditions present on publish-image and distribution-evals).

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

miot-harness/src/miot_harness/api/server.py (1)

65-94: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep fallback-to-disabled state internally consistent.

When graph/model wiring fails, Nexo is marked disabled but nexo_registered and nexo_pool are left as-if enabled. This can expose contradictory health state and hold unnecessary DB connections open.

🔧 Suggested fix

                 except Exception as exc:  # noqa: BLE001
                     logger.critical(
                         "Nexo: failed to build chat models / graph (%s); "
                         "falling back to Nexo disabled",
                         exc,
                     )
                     app.state.nexo_enabled = False
+                    app.state.nexo_registered = []
+                    app.state.nexo_pool = None
                     harness.nexo_graph = None
+                    if pool is not None:
+                        await pool.close()
+                        pool = None

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@miot-harness/src/miot_harness/api/server.py` around lines 65 - 94, When
building the Nexo models/graph fails inside the try block, clear any state that
was set as-if Nexo succeeded: set app.state.nexo_enabled = False, set
harness.nexo_graph = None, unset/clear harness.nexo_registered (replace
result.registered) and release/clear the DB connection by closing and/or setting
app.state.nexo_pool = None (use pool.close()/await pool.close() if available).
Do this inside the except that catches Exception (the same block that currently
sets app.state.nexo_enabled = False) so the internal state (app.state.nexo_pool,
harness.nexo_registered, harness.nexo_graph) remains consistent when
build_nexo_graph / get_chat_model fails.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/harness.yaml:
- Around line 292-299: The workflow currently uses a mutable reference
aquasecurity/trivy-action@master; replace that with the full commit SHA of the
Trivy action to make the step immutable (e.g., uses:
aquasecurity/trivy-action@<full-commit-sha>) or at minimum a fixed tag like
`@v0.36.0`; update the uses line in the Run Trivy step (the uses:
aquasecurity/trivy-action@master entry) to the chosen full-length commit SHA and
commit the change so future runs reference an immutable action version.

In `@miot-harness/.env.example`:
- Around line 5-7: Update the .env.example to include placeholder
entries/comments for the new deploy knobs so the template matches
src/miot_harness/config.py: add lines for MIOT_HARNESS_NEXO_DSN,
MIOT_HARNESS_NEXO_APPLICATION_NAME, MIOT_HARNESS_LOG_LEVEL, and
MIOT_HARNESS_REQUEST_ID_HEADER (with brief comment describing expected
value/format and sensible default or "required" note) so operators can audit all
settings in one place; ensure variable names exactly match the symbols used in
the codebase.
- Around line 34-56: The default tenant and the hard tenant lock conflict:
update the env template so MIOT_HARNESS_NEXO_TENANT_LOCK matches
MIOT_HARNESS_DEFAULT_TENANT_ID (e.g., set
MIOT_HARNESS_NEXO_TENANT_LOCK=demo-tenant) or, alternatively, change
MIOT_HARNESS_DEFAULT_TENANT_ID to mintral and update the explanatory comment
accordingly so MIOT_HARNESS_DEFAULT_TENANT_ID and MIOT_HARNESS_NEXO_TENANT_LOCK
are consistent.

In `@miot-harness/docker-compose.yml`:
- Around line 38-46: The harness service in docker-compose.yml must add a host
mapping so host.docker.internal resolves on Linux; update the harness service
block (where MIOT_HARNESS_WORKSPACE_DIR is set and volumes are declared) to
include an extra_hosts entry mapping "host.docker.internal" to the host gateway
(e.g., "host.docker.internal:host-gateway") so MIOT_HARNESS_NEXO_DSN pointing at
host.docker.internal:6434 will work without --network host or manual host
changes.

In `@miot-harness/evals/deploy/03-image-runs-demo.sh`:
- Around line 62-68: The health-check polling loop using DEADLINE currently
falls through if /health never becomes ready; update the script so after the
loop you detect timeout and fail fast: after the while loop that polls
"http://localhost:${PORT}/health" (the block using DEADLINE and curl -fs
--max-time 2) check whether the last curl succeeded (or whether $(date +%s) is
>= DEADLINE) and if it timed out print a clear error (including the port) and
exit 1; apply the same change to the second identical polling block around lines
74-80 so CI job aborts immediately when readiness never succeeds.

In `@miot-harness/evals/deploy/08-tag-discipline.sh`:
- Around line 45-55: In the push case when BRANCH matches v* the EXPECTED_KEYS
array is too lax (only "sha-") so release-tag regressions can slip; update the
branch-v* branch handler to require the full-version and major.minor release
tags in addition to the sha tag by adding the corresponding expected key
patterns to EXPECTED_KEYS (refer to the push) case, the BRANCH variable check,
and the EXPECTED_KEYS symbol) and keep EXPECT_DOCKERHUB="yes".

In `@miot-harness/src/miot_harness/agents/synthesizer.py`:
- Around line 74-78: The user-facing string mixes Spanish with the raw English
`reason` (matched by `_SNAPSHOT_STALE_PREFIX`), so change the branch that
handles `if reason.startswith(_SNAPSHOT_STALE_PREFIX):` to produce a fully
Spanish message: parse out any age or numeric details from `reason` (e.g.,
extract the minutes) and format a Spanish sentence like "No puedo responder
ahora mismo: el snapshot tiene X minutos; vuelve a intentarlo cuando esté fresco
o contacta a operaciones." — keep the original `reason` logged for operators
(separate log call) rather than shown to end users so tests such as the one
referenced in test_synthesizer.py continue to reflect localized output.

In `@miot-harness/tests/api/test_health.py`:
- Around line 14-17: The fixture _clear_settings_cache currently only unsets
MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT; also ensure it unsets MIOT_HARNESS_NEXO_DSN
via monkeypatch.delenv("MIOT_HARNESS_NEXO_DSN", raising=False) so the
default-disabled assumption holds for tests, keeping the existing call to
get_settings.cache_clear() intact; update the fixture body where
monkeypatch.delenv is used to remove both env vars.

---

Outside diff comments:
In `@miot-harness/src/miot_harness/api/server.py`:
- Around line 65-94: When building the Nexo models/graph fails inside the try
block, clear any state that was set as-if Nexo succeeded: set
app.state.nexo_enabled = False, set harness.nexo_graph = None, unset/clear
harness.nexo_registered (replace result.registered) and release/clear the DB
connection by closing and/or setting app.state.nexo_pool = None (use
pool.close()/await pool.close() if available). Do this inside the except that
catches Exception (the same block that currently sets app.state.nexo_enabled =
False) so the internal state (app.state.nexo_pool, harness.nexo_registered,
harness.nexo_graph) remains consistent when build_nexo_graph / get_chat_model
fails.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df852f38-466c-4f52-87ad-d4aa04bbaaec

📥 Commits

Reviewing files that changed from the base of the PR and between 1fd0068 and 494c86c.

📒 Files selected for processing (27)

.github/workflows/harness.yaml
miot-harness/.dockerignore
miot-harness/.env.example
miot-harness/Dockerfile
miot-harness/docker-compose.yml
miot-harness/evals/deploy/01-image-builds.sh
miot-harness/evals/deploy/02-image-boots.sh
miot-harness/evals/deploy/03-image-runs-demo.sh
miot-harness/evals/deploy/04-workflow-shape.sh
miot-harness/evals/deploy/05-pulls-from-ghcr.sh
miot-harness/evals/deploy/06-pulls-from-dockerhub.sh
miot-harness/evals/deploy/07-attestations-present.sh
miot-harness/evals/deploy/08-tag-discipline.sh
miot-harness/evals/deploy/B-checklist.md
miot-harness/evals/deploy/README.md
miot-harness/evals/deploy/run-all.sh
miot-harness/src/miot_harness/agents/synthesizer.py
miot-harness/src/miot_harness/api/server.py
miot-harness/src/miot_harness/config.py
miot-harness/src/miot_harness/integrations/nexo/boot.py
miot-harness/src/miot_harness/integrations/nexo/credentials.py
miot-harness/src/miot_harness/integrations/nexo/pool.py
miot-harness/tests/api/__init__.py
miot-harness/tests/api/test_health.py
miot-harness/tests/integrations/nexo/test_credentials.py
miot-harness/tests/integrations/nexo/test_pool.py
miot-harness/tests/test_synthesizer.py

coderabbitai · 2026-05-10T05:29:06Z

+# Default tenant/user when a request omits them. Per project policy, real
+# tenant context comes from authenticated server context, not from these.
 MIOT_HARNESS_DEFAULT_TENANT_ID=demo-tenant
 MIOT_HARNESS_DEFAULT_USER_ID=demo-user

+# -----------------------------------------------------------------------------
+# Nexo data integration (Coordinador / Mintral)
+# -----------------------------------------------------------------------------
+# Path to a local clone of the db-scripts repo. The harness reads
+# `<root>/databases/<alias>/.env` for PG credentials at lifespan boot.
+# Leave commented to disable the Nexo integration (harness still serves
+# non-Nexo runs with mocked tools).
+# MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT=/path/to/db-scripts
+
+# Which alias under `db-scripts/databases/` to load. The harness expects a
+# read-only `harness` PG role to exist in the target DB (seeded by
+# `db-scripts/scripts/seed/create-harness-reader-role.sql`).
+MIOT_HARNESS_NEXO_DB_ALIAS=coordinador-dev
+
+# Hard tenant lock applied to every coordinador_* tool call. Must match
+# MIOT_HARNESS_DEFAULT_TENANT_ID for end-to-end runs.
+MIOT_HARNESS_NEXO_TENANT_LOCK=mintral
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Default tenant values conflict with the stated lock requirement.

The template says the Nexo tenant lock must match the default tenant for end-to-end runs, but the defaults differ. This is easy to trip over during first-time setup.

🔧 Suggested fix

-MIOT_HARNESS_DEFAULT_TENANT_ID=demo-tenant +MIOT_HARNESS_DEFAULT_TENANT_ID=mintral

or set MIOT_HARNESS_NEXO_TENANT_LOCK=demo-tenant to keep the current default tenant.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Default tenant/user when a request omits them. Per project policy, real

# tenant context comes from authenticated server context, not from these.

MIOT_HARNESS_DEFAULT_TENANT_ID=demo-tenant

MIOT_HARNESS_DEFAULT_USER_ID=demo-user

# -----------------------------------------------------------------------------

# Nexo data integration (Coordinador / Mintral)

# -----------------------------------------------------------------------------

# Path to a local clone of the db-scripts repo. The harness reads

# `<root>/databases/<alias>/.env` for PG credentials at lifespan boot.

# Leave commented to disable the Nexo integration (harness still serves

# non-Nexo runs with mocked tools).

# MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT=/path/to/db-scripts

# Which alias under `db-scripts/databases/` to load. The harness expects a

# read-only `harness` PG role to exist in the target DB (seeded by

# `db-scripts/scripts/seed/create-harness-reader-role.sql`).

MIOT_HARNESS_NEXO_DB_ALIAS=coordinador-dev

# Hard tenant lock applied to every coordinador_* tool call. Must match

# MIOT_HARNESS_DEFAULT_TENANT_ID for end-to-end runs.

MIOT_HARNESS_NEXO_TENANT_LOCK=mintral

# Default tenant/user when a request omits them. Per project policy, real

# tenant context comes from authenticated server context, not from these.

MIOT_HARNESS_DEFAULT_TENANT_ID=mintral

MIOT_HARNESS_DEFAULT_USER_ID=demo-user

# -----------------------------------------------------------------------------

# Nexo data integration (Coordinador / Mintral)

# -----------------------------------------------------------------------------

# Path to a local clone of the db-scripts repo. The harness reads

# `<root>/databases/<alias>/.env` for PG credentials at lifespan boot.

# Leave commented to disable the Nexo integration (harness still serves

# non-Nexo runs with mocked tools).

# MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT=/path/to/db-scripts

# Which alias under `db-scripts/databases/` to load. The harness expects a

# read-only `harness` PG role to exist in the target DB (seeded by

# `db-scripts/scripts/seed/create-harness-reader-role.sql`).

MIOT_HARNESS_NEXO_DB_ALIAS=coordinador-dev

# Hard tenant lock applied to every coordinador_* tool call. Must match

# MIOT_HARNESS_DEFAULT_TENANT_ID for end-to-end runs.

MIOT_HARNESS_NEXO_TENANT_LOCK=mintral

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@miot-harness/.env.example` around lines 34 - 56, The default tenant and the hard tenant lock conflict: update the env template so MIOT_HARNESS_NEXO_TENANT_LOCK matches MIOT_HARNESS_DEFAULT_TENANT_ID (e.g., set MIOT_HARNESS_NEXO_TENANT_LOCK=demo-tenant) or, alternatively, change MIOT_HARNESS_DEFAULT_TENANT_ID to mintral and update the explanatory comment accordingly so MIOT_HARNESS_DEFAULT_TENANT_ID and MIOT_HARNESS_NEXO_TENANT_LOCK are consistent.

coderabbitai · 2026-05-10T05:29:07Z

+  push)
+    if [[ "$BRANCH" == "trunk" || "$BRANCH" == "main" ]]; then
+      EXPECTED_KEYS=("latest" "sha-")
+      EXPECT_DOCKERHUB="yes"
+    elif [[ "$BRANCH" == v* ]]; then
+      EXPECTED_KEYS=("sha-")  # plus full + major.minor — best-effort
+      EXPECT_DOCKERHUB="yes"
+    else
+      EXPECTED_KEYS=("sha-")
+      EXPECT_DOCKERHUB="no"
+    fi


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Release-tag runs can pass without any release tags.

For push on v*, this only requires sha-, so a regression that drops <full version> or <major>.<minor> tags still passes. That leaves the release-tag policy effectively unchecked in the automated path.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@miot-harness/evals/deploy/08-tag-discipline.sh` around lines 45 - 55, In the push case when BRANCH matches v* the EXPECTED_KEYS array is too lax (only "sha-") so release-tag regressions can slip; update the branch-v* branch handler to require the full-version and major.minor release tags in addition to the sha tag by adding the corresponding expected key patterns to EXPECTED_KEYS (refer to the push) case, the BRANCH variable check, and the EXPECTED_KEYS symbol) and keep EXPECT_DOCKERHUB="yes".

Per post-mortem on the deploy-eval scope: about half the scripts were redundant with what CI naturally enforces, or asserted things that re-stated config. Trimming before merge so the suite reflects genuine load-bearing checks instead of paranoia. Removed: - 04-workflow-shape.sh — asserts the GHA workflow's job-set shape via `gh run view --json jobs`. The PR check UI already surfaces missing/failed jobs; a self-asserting script is paranoid and brittle (broke whenever job names were renamed). - 08-tag-discipline.sh — asserts the tag pattern published by a run via run-log grep. The metadata-action config IS the tag spec; a script re-asserting it duplicates YAML in another language. Brittle (run-log substring parsing) and rarely catches anything. Updated: - run-all.sh: dropped the `--with-distribution` orchestration verbiage; orchestrator runs Category A only (01/02/03), Category C scripts (05/06/07) are invoked by CI's distribution-evals job with real digest args. - harness.yaml: distribution-evals job dropped the `Tag discipline` step; now runs `Pull from GHCR` → `Verify attestations` → `Pull from Docker Hub` (non-PR only). RUN_ID env var no longer needed. - README.md: unified table across Categories A and C; added a "what these evals do NOT cover" section explaining the dropped checks. - B-checklist.md: replaced "verified by 08-tag-discipline" with a direct spot-check acknowledging metadata-action IS the spec. Added attestation negative-control reminder. What stays (each earns its cost): - 01: build + compressed-size budget. - 02: container actually works at runtime — load-bearing. - 03: optional, gated on API key — proves wires connect. - 05/06: catches "push exited 0 but manifest didn't land" + Docker Hub PR-skip-guard regressions. - 07: most valuable — negative-control for the supply-chain story. Net: 9 → 7 scripts, ~150 lines removed. Verification: - bash miot-harness/evals/deploy/run-all.sh → 3 PASS lines (01 PASS, 02 PASS, 03 PASS-SKIPPED), unchanged. - YAML parses; distribution-evals job 5 → 4 steps. - Lint/types/tests unaffected.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@miot-harness/evals/deploy/run-all.sh`:
- Around line 38-47: The current pipeline uses bash "$HERE/$script" | tee | head
which can yield a non-zero pipeline exit (SIGPIPE) and wrongly run the else
branch; instead run the script and capture its full stdout+stderr into a
variable, record the script's real exit status via PIPESTATUS[0] (or
${PIPESTATUS[0]} immediately after the pipeline), then extract FIRST as the
first line from that captured output and use the recorded exit status to decide
the PASS/FAIL/SKIP logic (update places referencing FIRST, bash "$HERE/$script",
tee, head, and use PIPESTATUS to set FAIL/PASS/ SUMMARY accordingly).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d10267e9-ac51-4a86-833d-3458a1d804b2

📥 Commits

Reviewing files that changed from the base of the PR and between 494c86c and ac960bd.

📒 Files selected for processing (4)

.github/workflows/harness.yaml
miot-harness/evals/deploy/B-checklist.md
miot-harness/evals/deploy/README.md
miot-harness/evals/deploy/run-all.sh

✅ Files skipped from review due to trivial changes (1)

miot-harness/evals/deploy/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/harness.yaml

coderabbitai · 2026-05-10T16:59:32Z

+  if FIRST=$(bash "$HERE/$script" 2>&1 | tee /dev/stderr | head -n 1); then
+    case "$FIRST" in
+      "PASS"*"SKIPPED"*) SKIP=$((SKIP+1)); SUMMARY+=("⏭  $FIRST") ;;
+      "PASS"*)           PASS=$((PASS+1)); SUMMARY+=("✅ $FIRST") ;;
+      *)                 FAIL=$((FAIL+1)); SUMMARY+=("❓ $FIRST") ;;
+    esac
+  else
+    FAIL=$((FAIL+1))
+    SUMMARY+=("❌ $script (exit non-zero)")
+  fi


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/usr/bin/env bash set -euo pipefail # Repro: pipeline can fail despite successful producer due to head/pipe close. if FIRST=$(bash -lc 'printf "PASS 01\nextra\n"; exit 0' 2>&1 | tee /dev/stderr | head -n 1); then echo "IF branch taken: $FIRST" else echo "ELSE branch taken (unexpected for a passing producer)" fi

Repository: microboxlabs/modulariot

Length of output: 181

🏁 Script executed:

# Check if the file exists and inspect it fd -type f -name "run-all.sh" | head -20

Repository: microboxlabs/modulariot

Length of output: 238

🏁 Script executed:

# Correct syntax to find the file fd "run-all.sh"

Repository: microboxlabs/modulariot

Length of output: 104

🏁 Script executed:

# Read the file to inspect it cat -n miot-harness/evals/deploy/run-all.sh

Repository: microboxlabs/modulariot

Length of output: 2658

pipefail + tee|head can misclassify passing evals as failures.

On line 38, a script that prints more than one line may trip a non-zero pipeline status (SIGPIPE path) even when the eval itself passes, triggering the else branch incorrectly.

Suggested fix

- if FIRST=$(bash "$HERE/$script" 2>&1 | tee /dev/stderr | head -n 1); then + OUTPUT="" + if OUTPUT=$(bash "$HERE/$script" 2>&1); then + printf '%s\n' "$OUTPUT" >&2 + FIRST=$(printf '%s\n' "$OUTPUT" | head -n 1) case "$FIRST" in "PASS"*"SKIPPED"*) SKIP=$((SKIP+1)); SUMMARY+=("⏭ $FIRST") ;; "PASS"*) PASS=$((PASS+1)); SUMMARY+=("✅ $FIRST") ;; *) FAIL=$((FAIL+1)); SUMMARY+=("❓ $FIRST") ;; esac else + printf '%s\n' "$OUTPUT" >&2 FAIL=$((FAIL+1)) SUMMARY+=("❌ $script (exit non-zero)") fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@miot-harness/evals/deploy/run-all.sh` around lines 38 - 47, The current pipeline uses bash "$HERE/$script" | tee | head which can yield a non-zero pipeline exit (SIGPIPE) and wrongly run the else branch; instead run the script and capture its full stdout+stderr into a variable, record the script's real exit status via PIPESTATUS[0] (or ${PIPESTATUS[0]} immediately after the pipeline), then extract FIRST as the first line from that captured output and use the recorded exit status to decide the PASS/FAIL/SKIP logic (update places referencing FIRST, bash "$HERE/$script", tee, head, and use PIPESTATUS to set FAIL/PASS/ SUMMARY accordingly).

Triaged each finding against current code; fixed the still-valid ones, skipped two with reason. All gates green (ruff, mypy 51 files, pytest 151 — local failures are .env-leak from the dogfood session, not from these edits; verified with .env moved aside). Fixes: 1. .github/workflows/harness.yaml — pin trivy-action by SHA (`@c1824fd6...# v0.34.0`) instead of mutable `@master`. Matches the pattern already used in `ci.yaml` for reproducibility / supply-chain. 2. miot-harness/.env.example — add the four T05 settings that should have been there from the start: MIOT_HARNESS_NEXO_DSN (commented, placeholder DSN), MIOT_HARNESS_NEXO_APPLICATION_NAME=miot-harness, MIOT_HARNESS_LOG_LEVEL=INFO, MIOT_HARNESS_REQUEST_ID_HEADER= x-request-id. Each with a comment matching the source-of-truth in config.py. 4. miot-harness/docker-compose.yml — add `extra_hosts: ["host.docker. internal:host-gateway"]` to the `harness` service. macOS/Windows already resolve this name automatically; the directive is a no-op there. On Linux it's required for MIOT_HARNESS_NEXO_DSN pointing at `host.docker.internal:6434` (a host-side kubectl port-forward) to work without `--network host`. 5. miot-harness/evals/deploy/03-image-runs-demo.sh — fail-fast on /health timeout. Previously the polling loop just `break`d on timeout and proceeded to `docker exec`, producing a confusing "demo command exited 1" message. Now it sets a READY flag, exits `1` with a clear error including the port if /health never came up. (Reviewer flagged a "second polling block at 74-80" — that's not a poll, it's the docker exec; only one poll exists, only one fix.) 7. miot-harness/src/miot_harness/agents/synthesizer.py — render stale-snapshot refusal fully in Spanish. The previous code embedded the English `reason` ("Coordinador snapshot is stale (age 4320min > refuse threshold 240min).") inside a Spanish frame, mixing languages for end users. Now we parse the age via regex `\(age\s*(\d+)\s*min` and render "el snapshot tiene <X> minutos…"; fall back to a generic Spanish message if the regex doesn't match (defends against upstream format changes). Internal reason is NOT shown to the user; freshness_judge already logs it for operators. Existing test passes via "snapshot" loanword. 8. miot-harness/tests/api/test_health.py — fixture also delenvs MIOT_HARNESS_NEXO_DSN. After T05 added the DSN bypass, the lifespan only short-circuits to "Nexo disabled" if BOTH NEXO_DSN and NEXO_DB_SCRIPTS_ROOT are unset. Without this delenv, an operator with NEXO_DSN in their .env would have these tests try to connect. 9. miot-harness/src/miot_harness/api/server.py — clear app.state.nexo_* public state when the Nexo graph build fails. Previously the inner except cleared `nexo_enabled = False` and `nexo_graph = None`, but left `app.state.nexo_pool`, `app.state.nexo_registered`, and `app.state.nexo_snapshot_age_minutes` populated. /health would then report a misleading mix: enabled=False but tools=[N names]. Now all four public fields reset together. The pool itself is still closed by the outer `finally`; we just drop the public ref. Skipped: 3. .env.example tenant_lock vs default_tenant_id mismatch — the current values match `config.py` defaults exactly (default_tenant_id=demo-tenant, nexo_tenant_lock=mintral); the comment already explains they should match for end-to-end runs. Aligning them in the template would diverge the .env.example from config.py defaults — strictly worse. Operators set both via their real .env, where they MUST match anyway. 6. 08-tag-discipline.sh tightening — file deleted in commit ac960bd during the deploy-eval scope trim (review feedback that asserted things which re-stated config). The script no longer exists; the review suggestion is moot. Verification: - uv run ruff check src tests → All checks passed. - uv run mypy → Success: no issues found in 51 source files. - uv run pytest -q (with worktree's dogfood .env aside) → 151 passed. - bash miot-harness/evals/deploy/run-all.sh → 3 PASS lines unchanged. - YAML parse OK; trivy step now references the pinned SHA.

korutx added 11 commits May 9, 2026 23:51

korutx marked this pull request as ready for review May 10, 2026 05:21

coderabbitai Bot reviewed May 10, 2026

View reviewed changes

korutx force-pushed the harness-deploy-loop branch from f920ea1 to 76d18d4 Compare May 10, 2026 16:55

korutx force-pushed the harness-deploy-loop branch from 76d18d4 to ac960bd Compare May 10, 2026 16:56

coderabbitai Bot reviewed May 10, 2026

View reviewed changes

korutx merged commit 8ee7753 into microboxlabs:trunk May 10, 2026
7 checks passed

korutx deleted the harness-deploy-loop branch May 10, 2026 17:16

korutx mentioned this pull request May 10, 2026

feat(skills): add plan-loop skill for /loop-based autonomous workflows #448

Merged

This was referenced May 15, 2026

harness(phase-13): per-agent telemetry + agentic search foundation #462

Merged

Feat/harness sse streaming #496

Merged

coderabbitai Bot mentioned this pull request May 25, 2026

harness(api): add /health/ready endpoint for kubelet readinessProbe #520

Merged

5 tasks

Conversation

korutx commented May 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the diff

Phase A — in-scope harness fixes (T01–T05)

Phase B — container packaging (T06–T08, T08b)

Phase C — CI (T09, T10a, T10b)

Test plan — VERIFIED ON THIS PR'S CI

Out of scope (deferred, tracked in plan 08-followups.md)

Closes / relates

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

korutx commented May 10, 2026 •

edited by coderabbitai Bot

Loading

Out of scope (deferred, tracked in plan `08-followups.md`)

coderabbitai Bot commented May 10, 2026 •

edited

Loading