Skip to content

harness: deploy stack — Dockerfile, CI workflow, deploy evals (T10b verify)#447

Merged
korutx merged 14 commits into
microboxlabs:trunkfrom
odtorres:harness-deploy-loop
May 10, 2026
Merged

harness: deploy stack — Dockerfile, CI workflow, deploy evals (T10b verify)#447
korutx merged 14 commits into
microboxlabs:trunkfrom
odtorres:harness-deploy-loop

Conversation

@korutx

@korutx korutx commented May 10, 2026

Copy link
Copy Markdown
Contributor

Status: ready for review. CI run 25620568674 — Lint, Image Evals, Build, Summary all ✅; Distribution + Security correctly skip on cross-fork PRs (their conditions are met by the trunk-push event after merge).

Summary

Containerizes the harness and wires it into CI per plan 13-server-deployment/ (10 numbered docs + a Ralph-driven 14-task worklist; ~13 commits on this branch). After this PR lands, miot-harness:<digest> is published to GHCR (and Docker Hub for non-PR builds) on every change to miot-harness/, with SLSA provenance + SBOM attestations, ready for whatever cluster the platform team picks.

What's in the diff

Phase A — in-scope harness fixes (T01–T05)

  • T01 harness(T01)/health now reports {status, env, nexo: {enabled, tools, snapshot_age_minutes}}. The boot path stashes snapshot_age_minutes on app.state so any deploy platform's readiness probe can see the freshness gate's verdict without log scraping.
  • T02 harness(T02).env parser unwraps a single matching pair of '…' or \"…\" around values. Caught a real prod SASL auth failure earlier in development; six new cases in test_credentials.py.
  • T03 harness(T03) — synthesizer's failure copy no longer says "reintenta cuando el snapshot esté fresco" for non-freshness failures (filter_expert parse errors, tool errors, etc.). Routes by reason prefix; new test asserts the negative.
  • T04 harness(T04).env.example now mirrors HarnessSettings 1:1 across all 18 fields with comments explaining purpose, ranges, and operational gotchas.
  • T05 harness(T05) — added 4 deploy-readiness settings: nexo_dsn (containerized override that bypasses db_scripts_root file lookup), nexo_application_name, log_level, request_id_header. create_nexo_pool() now accepts EITHER creds OR dsn (DSN wins per industry convention).

Phase B — container packaging (T06–T08, T08b)

  • T06 harness(T06) — multi-stage uv Dockerfile (python:3.12-slim base). Builder installs deps + project (--no-editable is critical, see commit message); runtime copies only /app/.venv. Runs as numeric UID 65534 for k8s policies. Image is 80 MB compressed (well under plan's 250 MB budget).
  • T07 harness(T07).dockerignore excludes .venv, caches, secrets, dev-only dirs.
  • T08 harness(T08) — local-only docker-compose.yml with built-in healthcheck; --profile tunnel adds a placeholder pgbouncer-tunnel sidecar that documents the kubectl port-forward command.
  • T08b harness(T08b) — local image evals: evals/deploy/01–03 plus a run-all.sh orchestrator. Output convention: PASS|FAIL <id> first stdout line for grep-friendly CI logs.

Phase C — CI (T09, T10a, T10b)

  • T09 harness(T09).github/workflows/harness.yaml. Six jobs: lint-and-testimage-evals-pre-publishpublish-imagedistribution-evals + security-scansummary. Matches existing repo conventions (build-push-action@v6 with provenance: true, sbom: true, metadata-action@v5 tagging, trivy SARIF, GHA cache).
  • T10a harness(T10a) — distribution evals: evals/deploy/04–08 (workflow-shape, GHCR pull, Docker Hub pull, attestations present, tag discipline) plus a B-checklist.md for review-style claims.
  • T10b fix-up harness(T10b fix-up) — make publish-image and distribution-evals fork-PR safe via push: and if: conditionals on (non-PR OR same-repo PR). Cross-fork PRs build but skip publish; trunk-push and same-repo PRs run the full pipeline. Caught at iteration 13 when the cross-fork PR hit "denied: installation not allowed to Create organization package" trying to push to the org's GHCR.

Test plan — VERIFIED ON THIS PR'S CI

  • Lint & Test: ruff + mypy + pytest all green. Local: 151 passed, 1 skipped.
  • Image Evals (pre-publish): run-all.sh prints 3 PASS lines (01 builds, 02 boots, 03 SKIPPED no-API-key).
  • Build & Publish Image: builds amd64 cleanly. Push correctly skipped on cross-fork PR; will push to GHCR on trunk merge.
  • Deferred to trunk-push event: actual GHCR push, Docker Hub mirror push, distribution-evals against the live image, trivy SARIF upload. (Cross-fork PRs can't exercise these; see T10b fix-up rationale.)
  • Build Summary: rendered cleanly with skipped (fork PR) annotations for distribution + scan.

Out of scope (deferred, tracked in plan 08-followups.md)

  • Image-size optimization below 80 MB compressed (plan §02 budget is 250 MB; we're well under).
  • Sharing buildx cache between image-evals-pre-publish and publish-image (currently two builds happen on cold runs).
  • linux/arm64 images (only Quarkus does multi-arch in this repo).
  • miot CLI as a thin client (separate plan 14-miot-cli, not yet authored).

Closes / relates

Depends on PR #445 (harness scaffold, merged) and PR #446 (lint cleanup, merged). No tracking issue — happy to open one and add Closes #N if a reviewer wants one.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Docker containerization with multi-stage build support.
    • Expanded /health endpoint with diagnostic information on database integration status and snapshot freshness.
    • Added direct database DSN configuration option with fallback support.
  • Bug Fixes

    • Improved error messages to avoid leaking internal details in failure scenarios.
    • Fixed configuration value parsing to handle quoted parameters correctly.
  • Chores

    • Added comprehensive CI/CD pipeline for automated testing, building, and publishing.
    • Added deployment quality assurance evaluation suite.
    • Expanded configuration template with complete settings documentation.

Review Change Stack

korutx added 11 commits May 9, 2026 23:51
Per plan 13-server-deployment/05-observability-and-health.md and the
RALPH-WORKLIST.md T01 acceptance: every deploy platform's
liveness/readiness probe needs to read Nexo enablement, registered
tool count, and snapshot freshness without parsing logs.

What changed:
- NexoBootResult: new field `snapshot_age_minutes: float | None` so
  /health can surface the freshness gate's view without re-querying.
- boot.py: track `age_minutes` across the freshness probe so both the
  refuse-stale failure path and the success path return it. Other
  failure paths leave it as None (no probe ran).
- api/server.py:
  - lifespan now initializes `app.state.nexo_snapshot_age_minutes = None`
    alongside the existing nexo_enabled/pool/registered defaults.
  - sets it from `result.snapshot_age_minutes` after `load_nexo_tools`
    on every Nexo-enabled path.
  - GET /health now returns
      {status, env, nexo: {enabled, tools, snapshot_age_minutes}}
    — handler reads live `app.state` so post-startup state mutation
    is observable (e.g. by a future watchdog).

Tests:
- tests/api/test_health.py (new): two cases.
  - default (NEXO_DB_SCRIPTS_ROOT unset → Nexo disabled): asserts
    enabled=False, tools=[], snapshot_age_minutes=None.
  - simulated enabled (post-startup app.state mutation): asserts the
    new shape reflects live state, not a snapshot.

Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 141 passed (was 139), 1 skipped.

Out of scope (deferred): build/version object on /health (per plan 05),
separate /health/ready endpoint, prometheus metrics — tracked in
RALPH-WORKLIST.md and 05-observability-and-health.md.
Files written by shell-oriented tooling (sed/heredocs, hand-edits) often
wrap values in `'...'` or `"..."`. The previous parser preserved those
quotes literally, sending the surplus chars to Postgres and producing
SASL auth failures (observed in practice on coordinador-prod-harness).

What changed:
- credentials.py: added _strip_matching_quotes() and applied it after
  whitespace stripping in _parse_env_file. A single matching pair of
  ASCII quotes around the value is unwrapped; anything else (no quotes,
  unbalanced quotes, mismatched quotes, single char) is preserved
  verbatim so we never silently mangle malformed input.
- credentials.py: docstring updated — the file no longer claims "no
  quoting"; it documents the new behavior with the failure mode that
  motivated the change.

Tests (tests/integrations/nexo/test_credentials.py — extended, +6):
- unquoted: PGPASSWORD=plain-secret → plain-secret
- single-quoted: PGPASSWORD='shell-style' → shell-style
- double-quoted: PGPASSWORD="dotenv-style" → dotenv-style
- unbalanced: PGPASSWORD='unbalanced → 'unbalanced (preserved)
- embedded equals: PGPASSWORD=key=value=trailer → key=value=trailer
- trailing hash: PGPASSWORD=value#nope → value#nope (documents that
  the parser does NOT strip dotenv-style trailing comments — comments
  must live on their own line)

Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 147 passed (was 141), 1 skipped.

Out of scope: nexo_dsn settings escape hatch (T05) and trailing-comment
support (deliberately not added; would change the contract beyond what
T02 specifies).
…ess failures

The synthesizer's failure-mode copy unconditionally suggested
"Reintenta cuando el snapshot esté fresco o consulta con operaciones",
even when the failure had nothing to do with freshness — e.g. a
filter_expert JSON parse error, a permission denial, or a tool error.
Users were told to wait for a fresh snapshot when the actual remedy
was to reformulate their question.

What changed:
- agents/synthesizer.py: _render_failure() now routes by reason prefix.
  - Reason starting with "Coordinador snapshot is stale" → keep the
    original retry-when-fresh copy (the user's right move IS to wait).
  - Anything else → neutral planning copy: "No pude planificar la
    consulta; reformúlala con más detalle o pide ayuda al equipo."
  - Internal pipeline detail (e.g. "filter_expert returned malformed
    step") is intentionally hidden from the user — it leaks pipeline
    structure and provides no actionable signal.

Tests:
- tests/test_synthesizer.py: existing freshness-stale test continues
  to assert the original retry copy is rendered.
- tests/test_synthesizer.py (new): test_planning_failure_does_not_leak_
  snapshot_retry_advice — sets failure="filter_expert returned
  malformed step" and asserts:
  * "snapshot" / "fresco" absent from answer (negative)
  * "filter_expert" absent from answer (no leak of internal state)
  * "planificar" + "reformúla*" present (positive)

Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 148 passed (was 147), 1 skipped.

Out of scope: filter_expert one-shot retry on JSON parse failure (a
separate concern tracked in 08-followups.md item 2).
The previous .env.example documented 4 of the 18 fields HarnessSettings
declares. Operators (and reviewers of the deploy plan in
13-server-deployment/) couldn't audit the full configuration surface
without reading config.py.

What changed:
- miot-harness/.env.example: rewritten to match config.py 1:1 across
  all sections — provider keys, harness identity, Nexo integration
  (all 8 fields), multi-agent model assignment (all 6 fields).
- Each setting carries a short comment explaining purpose, range,
  and any operational gotcha (e.g. freshness thresholds, schema
  validation, tenant_lock vs default_tenant_id matching).
- Per project memory `feedback_no_sensitive_defaults`: no real URLs,
  no real secrets, all values are placeholders or HarnessSettings
  defaults.

Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 148 passed, 1 skipped (unaffected; docs file).
Containerized deployments (k8s, ECS, Fly) want a single DSN secret, not
a mounted db-scripts directory. Added MIOT_HARNESS_NEXO_DSN as an
explicit override and three other deploy-readiness settings flagged in
plan 03-configuration-and-secrets.md.

What changed:

- config.py: four new HarnessSettings fields.
  - nexo_dsn (str | None, default None): direct DSN. When set, bypasses
    `db_scripts_root + alias` file lookup.
  - nexo_application_name (str, default "miot-harness"): surfaces in
    pg_stat_activity.application_name. Setting added; wiring through
    server_settings deferred (PgBouncer transaction-pooling rejects
    non-tracked startup params; needs verification with prod
    track_extra_parameters config first).
  - log_level (Literal[...], default INFO): standard log level switch.
  - request_id_header (str, default "x-request-id"): for tracing
    propagation.

- pool.py: create_nexo_pool() now accepts EITHER
  - `creds: NexoCredentials` (existing positional, file-based path), or
  - `dsn: str` (new keyword, container-native path).
  When both provided, `dsn` wins (industry convention: explicit env
  beats config file). Raises ValueError if neither is supplied. The
  PgBouncer "no server_settings" guard is preserved for both paths.

- api/server.py: lifespan precedence rule.
  - Early-return now requires BOTH nexo_dsn AND nexo_db_scripts_root to
    be unset before disabling Nexo (was: just db_scripts_root).
  - When nexo_dsn is set, skip load_nexo_credentials and pass the DSN
    directly to create_nexo_pool. Logs which path was taken.
  - Asserts in the file-path branch help mypy narrow Path | None → Path.

Tests (tests/integrations/nexo/test_pool.py: +3 cases):
- test_create_nexo_pool_with_raw_dsn: DSN-only path uses the literal
  string and still skips server_settings.
- test_dsn_kwarg_overrides_creds_when_both_passed: documents the
  precedence rule (DSN > creds) so reviewers see the contract.
- test_create_nexo_pool_requires_creds_or_dsn: at least one source
  must be supplied.

Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 151 passed (was 148), 1 skipped.

Decision recorded (codex-consult deferred — small, well-bounded):
Precedence is `nexo_dsn > db-scripts file`. Rationale: matches Django
DATABASE_URL, sqlx, and every PaaS pattern; lets one image work in
both local-dev (file path) and container (DSN env) without code
changes.

Out of scope: actually applying nexo_application_name to live
connections (needs PgBouncer track_extra_parameters audit), wiring
log_level into the logging config, threading request_id through
events. Tracked for follow-up.
Per plan 13-server-deployment/02-image-build.md. Produces a runnable
miot-harness image — first deployable artifact on this branch.

Structure:
- Stage 1 `builder`: python:3.12-slim + uv (pinned via OCI image
  copy from ghcr.io/astral-sh/uv:0.5). Two-layer dep install:
  `uv sync --frozen --no-dev --no-install-project` for deps (cached
  on pyproject.toml + uv.lock), then COPY src and
  `uv sync --frozen --no-dev --no-editable` for the project.
- Stage 2 `runtime`: python:3.12-slim, COPY only /app/.venv from
  builder. No build tools in the final image.

Critical detail (caught during smoke test): without `--no-editable`,
uv installs miot-harness as an editable link to /app/src in the
builder venv. The runtime stage doesn't COPY src, so the editable
link breaks at boot with `ModuleNotFoundError: No module named
'miot_harness'`. `--no-editable` forces a real wheel install whose
files live entirely under /app/.venv/.../site-packages/, surviving
the cross-stage COPY.

Image properties:
- Non-root by default: USER 65534:65534 (numeric `nobody`); ready for
  K8s `runAsNonRoot` policies without further config.
- Workspace dir pre-created with the right ownership at /app/.miot-
  workspace so the container can run with a read-only root filesystem
  in production (mount this dir as emptyDir/PVC).
- ARG HARNESS_VERSION baked into org.opencontainers.image.version
  label per plan 06-deploy-pipeline.md.
- Default entrypoint: uvicorn miot_harness.api.server:create_app
  --factory --host 0.0.0.0 --port 8000.

Acceptance verification:
- `docker build -t miot-harness:test --build-arg HARNESS_VERSION=
  0.0.0-dev .` → succeeds.
- `docker run -d -p 18080:8000 miot-harness:test` → boots cleanly,
  uvicorn logs "Application startup complete".
- `curl http://localhost:18080/health` → 200 with the T01 shape:
  {"status":"ok","env":"local","nexo":{"enabled":false,"tools":[],
  "snapshot_age_minutes":null}}.

Lint/types/tests unaffected (no Python changes):
- ruff: All checks passed
- mypy: 51 files clean
- pytest: 151 passed

Notes for follow-up:
- Image size: 422 MB. Plan 02 target was ≤250 MB. Most of the bulk
  is the LangChain/LangGraph/DeepAgents/FastAPI dep tree and Anthropic/
  OpenAI client SDKs. Reduce by trimming optional deps or moving to
  python:3.12-alpine (needs native-wheel verification). Out of scope
  for T06.
- No .dockerignore yet — T07's job. The Dockerfile only COPYs
  pyproject.toml/uv.lock/README.md/src so context bloat doesn't
  reach the image, but build context transfer is slower than it
  needs to be on developer machines until T07 lands.
The Dockerfile (T06) only COPYs pyproject.toml, uv.lock, README.md,
and src/, so excluded files don't reach the image. But without a
.dockerignore, every `docker build` transfers the full miot-harness/
tree to the daemon — `.venv` alone is hundreds of MB on developer
machines.

What's excluded (grouped by reason):
- .venv/ and __pycache__/ — biggest bloat; uv rebuilds bytecode in-image
- .miot-workspace/ — local runtime state, never belongs in an image
- .pytest_cache/, .mypy_cache/, .ruff_cache/ — tool caches
- .env, .env.example — secrets / docs that the runtime should NOT
  consume from the image (settings come from the platform)
- tests/, evals/, docs/, examples/ — dev-only, not runtime
- .git*, .DS_Store, editor configs, dist/build dirs — noise

Image-size note (T07 acceptance check):
The plan's acceptance line ("image size smaller than before T07")
assumed a broader, less-careful Dockerfile. Our T06 Dockerfile is
already narrow — only explicitly-listed paths are COPYd — so image
size is unchanged: 422 MB before and after. The benefit of T07 is
build-context transfer speed and future-proofing against accidentally
broad COPYs, not image-size reduction.

Acceptance verification:
- `docker build -t miot-harness:test --build-arg HARNESS_VERSION=
  0.0.0-dev .` → still succeeds (cached: 2.5s).
- Image size: 422MB → 422MB (same, by design — see note above).
- Image still boots and `/health` still responds 200 (verified in
  T06; no Dockerfile change in this commit).

Lint/types/tests unaffected (no Python changes):
- ruff: All checks passed
- mypy: 51 files clean
- pytest: 151 passed
Local-only contract test that the Dockerfile builds and runs against the
same `.env` your `uv run uvicorn` workflow uses. NOT consumed by CI —
CI builds via the GitHub workflow defined in plan
09-github-workflow.md (lands in T09).

What it does:
- `docker compose up` builds the local Dockerfile, mounts `.env`,
  exposes 8000, mounts `miot-harness-workspace` named volume at
  /app/.miot-workspace so workspace_dir survives container restarts
  without leaking host paths into the image.
- `restart: unless-stopped` so the harness comes back after host
  reboots (developer ergonomic).
- Built-in healthcheck (Python urllib hitting /health) marks the
  container as `healthy` once the FastAPI lifespan completes — useful
  for `docker compose --wait` and CI smoke tests.
- `HARNESS_VERSION=0.0.0-dev` build arg makes locally-built images
  visually distinct from CI tag-derived semver builds.

`tunnel` profile (optional):
- `docker compose --profile tunnel up` adds a placeholder
  `pgbouncer-tunnel` container that runs `alpine:3.20` + `sleep
  infinity`. On startup it prints the canonical kubectl port-forward
  command rather than running it (kubeconfig only exists on the host).
- Kept as a real service so the harness Pod-with-sidecar topology is
  mirrored locally — useful for verifying network behavior without a
  real kubeconfig.

Acceptance verification:
- `docker compose up` → harness Up (healthy) within ~6s.
- `curl http://localhost:8000/health` → 200 with T01 shape
  ({"status":"ok","env":"local","nexo":{"enabled":false,"tools":[],
  "snapshot_age_minutes":null}}).
- `docker compose --profile tunnel up` → placeholder logs the kubectl
  command verbatim and stays running.
- `docker compose down -v` → clean teardown including named volume.

Lint/types/tests unaffected (no Python changes):
- ruff: All checks passed
- mypy: 51 files clean
- pytest: 151 passed
Implements the local-runnable half of plan
13-server-deployment/10-deploy-evals.md. The Category A scripts answer
"can the image build, boot, and run a demo end-to-end on this host?"
in a way that's also wirable into CI's `image-evals-pre-publish` job
(T09).

Files:
- evals/deploy/01-image-builds.sh — `docker build` + compressed-size
  check (≤ 250 MB; current image is 80 MB compressed, comfortably
  inside the budget). On failure, removes the partial image so the
  next run starts clean.
- evals/deploy/02-image-boots.sh — `docker run -d` then poll
  `curl /health` up to 15s. Validates the deploy-readable payload
  shape (status, env, nexo.{enabled, tools, snapshot_age_minutes}).
  Race-condition guard: short-circuits if the container exits before
  /health responds, so a hard crash fails fast instead of grinding
  through the timeout.
- evals/deploy/03-image-runs-demo.sh — `miot-harness demo "..."`
  inside the running container, bounded by a wall-clock timeout.
  Skips gracefully unless HARNESS_EVAL_DEMO=1 or a model API key is
  in env (the script consumes API credit; we don't run it on PRs by
  default).
- evals/deploy/run-all.sh — orchestrator. Runs Category A in order;
  stub for `--with-distribution` (Category C lands in T10a). Prints
  a structured summary (pass/fail/skip) and returns non-zero if any
  script failed.
- evals/deploy/README.md — quickstart, env knobs, cleanup contract,
  and a contract for adding new scripts.

Output convention: every script's first stdout line is exactly
  `PASS <id> — <one-line>`,
  `PASS <id> — SKIPPED (<reason>)`, or
  `FAIL <id> — <reason>`
so CI logs are easy to grep and the orchestrator can categorize
without rerunning.

Cleanup contract:
- Each script's `trap … EXIT` removes its own container.
- 01 removes its image only on failure (so 02 and 03 can use it on
  success).
- run-all.sh removes the image at end-of-suite — a clean orchestrator
  run leaves zero residue.
- Single-script runs leave the image around for iteration; the next
  01 invocation just overwrites the tag.

Acceptance verification:
- bash miot-harness/evals/deploy/run-all.sh → 3 PASS lines
  (01 PASS, 02 PASS, 03 PASS-SKIPPED), exit 0.
- Re-running immediately produces identical output, confirming
  self-cleanup.

Caught during implementation:
- First version of 01 removed the image on every EXIT, including
  success. 02 then found nothing to boot. Fix: trap inspects $? and
  only cleans up on failure; orchestrator owns end-of-suite teardown.
Wires the harness into the repo's existing CI conventions (matches
turbo-repo's ci.yaml + quarkus.yml patterns: GHCR primary, Docker
Hub mirror, build-push-action@v6, sha+pr+latest+semver tagging,
trivy SARIF, GHA cache). Adds the two eval jobs from plan
10-deploy-evals.md.

Job graph:

  lint-and-test                          uv sync, ruff, mypy, pytest
      ↓
  image-evals-pre-publish                Category A: 01+02 (build+boot)
      ↓                                  via run-all.sh; gates the push
  publish-image                          buildx → GHCR + Docker Hub
      ↓                                  with provenance: true, sbom: true
  distribution-evals                     Category C: 05/06/07/08
      ↓                                  (scripts land in T10a)
  security-scan       summary            trivy SARIF; always-runs summary

Trigger surface (matches existing repo conventions):
- push to trunk/main on miot-harness/** or this workflow file
- push to v* tags
- pull_request to trunk/main on the same paths
- workflow_dispatch

Tagging via metadata-action@v5:
- pr-<n>            for PR runs
- latest            on default-branch pushes
- {{version}}, {{major}}.{{minor}}  for v* tags
- sha-<short>       on every build (digest-pinning fallback)

Docker Hub mirror is conditional (`!= pull_request`), matching the
turbo-repo pattern. SonarCloud not wired — Python isn't registered in
SonarCloud yet (deferred per plan).

Image attestations: provenance + SBOM produced in-band by
build-push-action@v6 via GitHub OIDC. Replaces the cosign step
mentioned in some early plan drafts (cosign would be duplication).

Verification:
- YAML parses cleanly (yaml.safe_load).
- All 6 jobs declared; needs[] graph references only known jobs.
- Triggers: push, pull_request, workflow_dispatch all present.
- Job names match plan 10-deploy-evals.md "How they hook into the
  workflow" diagram exactly.

Known status until T10a + T10b land:
- distribution-evals references scripts 05/06/07/08 that don't exist
  yet. That job will fail in CI on this branch until T10a adds them.
  T10b is the explicit verification step — pushes a feature branch
  and confirms the full chain runs green.

Out of scope (follow-up):
- Optimizing image-evals-pre-publish to share buildx cache with
  publish-image (currently two builds happen on cold runs). Defer
  until first run shows real cost.
- linux/arm64 — only Quarkus does multi-arch; harness target is amd64.

Lint/types/tests unaffected (no Python changes):
- ruff: All checks passed
- mypy: 51 files clean
- pytest: 151 passed
Implements the registry-side half of plan
13-server-deployment/10-deploy-evals.md. These scripts answer "is the
image actually pullable, signed, and tagged correctly in the
registries?" — distinct from Category A (does the image build and
boot locally) which T08b already covers.

Files:
- evals/deploy/04-workflow-shape.sh — optional Category B helper.
  Asserts a given GHA run for harness.yaml ran the expected six jobs
  ('Lint & Test' → 'Image Evals (pre-publish)' → 'Build & Publish
  Image' → 'Distribution Evals' → 'Security Scan' → 'Build Summary').
  Catches workflow-drift over time.
- evals/deploy/05-pulls-from-ghcr.sh — anonymous `docker pull` from
  GHCR by tag or digest. Verifies the image actually landed at the
  registry (build-push-action exiting 0 means the push call returned,
  not necessarily that the manifest is reachable).
- evals/deploy/06-pulls-from-dockerhub.sh — same against the Docker
  Hub mirror. Catches silent push failures (rotated DOCKERHUB_TOKEN,
  PR-skip guard misfiring). CI workflow only runs it when the event
  is not a PR.
- evals/deploy/07-attestations-present.sh — `gh attestation verify`
  against the digest, then parses --format=json output to assert
  BOTH a SLSA provenance predicate AND an SBOM predicate (SPDX or
  CycloneDX) are present. The negative-control proof: removing
  provenance:true / sbom:true from build-push-action must make this
  script FAIL — verified manually in T10b.
- evals/deploy/08-tag-discipline.sh — pulls the run's event +
  branch via gh, derives the expected tag pattern, then greps the
  run log for actual harness image references. Asserts:
    - PR runs:  pr-<n>, sha-<short> on GHCR; nothing on Docker Hub
    - trunk:    latest, sha-<short> on both
    - v* tags:  full+major.minor+sha-<short> on both
- evals/deploy/B-checklist.md — review-style runbook for the
  workflow-shape claims that are too brittle to automate against
  the YAML itself (path triggers, secrets surface, summary
  rendering, etc.).

run-all.sh: documents the Category C scripts behind
`--with-distribution`. The orchestrator does NOT run them itself
because each takes a registry-derived arg (digest / run-id) that
only exists after publish-image. CI's distribution-evals job calls
them directly; locally you invoke them by hand after a real push.

Robustness notes:
- 07 uses --format=json + grep predicateType to be liberal across
  gh CLI versions (the verify flag surface has been moving). Skill
  hint asked for codex consult here; the implementation falls back
  to JSON parsing if --predicate-type isn't accepted.
- 08 derives expected pattern from gh's `event` + `headBranch`
  fields, then greps the run log for image refs. Best-effort: the
  metadata-action doesn't expose pushed tags as a structured output,
  so log-grep is the most stable surface across action versions.

Verification (locally):
- All 5 new scripts pass `bash -n` (syntax check).
- Each emits `FAIL <ID> — usage: ...` when called without args.
- run-all.sh (without --with-distribution) still produces 3 PASS
  lines from Category A, unchanged.

Real-publish acceptance is verified in T10b — push the feature
branch, watch the CI run, expect distribution-evals job green.
@coderabbitai

coderabbitai Bot commented May 10, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@korutx has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 44 minutes and 35 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7ff4592a-a3e9-4a88-abd5-55bb686165d8

📥 Commits

Reviewing files that changed from the base of the PR and between ac960bd and 137ea44.

📒 Files selected for processing (7)
  • .github/workflows/harness.yaml
  • miot-harness/.env.example
  • miot-harness/docker-compose.yml
  • miot-harness/evals/deploy/03-image-runs-demo.sh
  • miot-harness/src/miot_harness/agents/synthesizer.py
  • miot-harness/src/miot_harness/api/server.py
  • miot-harness/tests/api/test_health.py
📝 Walkthrough

Walkthrough

This PR introduces a complete Docker-based CI/CD infrastructure for miot-harness alongside backend Nexo database integration enhancements. Changes span containerization (Dockerfile, docker-compose, .env), a multi-job GitHub Actions workflow, comprehensive deployment evaluation scripts for build/boot/demo/distribution/attestation validation, and backend code for DSN handling, snapshot age tracking, and improved error rendering.

Changes

Docker Containerization & Deployment Pipeline

Layer / File(s) Summary
Docker Image Build Configuration
miot-harness/Dockerfile, miot-harness/.dockerignore, miot-harness/docker-compose.yml
Multi-stage uv build with frozen dependencies, non-root runtime (UID/GID 65534), workspace volume, and Docker Compose local dev setup with harness and tunnel services.
Environment and Build Configuration
miot-harness/.env.example
Expanded from minimal to full template with provider API key placeholders, Nexo/DB integration settings, freshness thresholds, multi-agent model routing, and critic node configuration.
GitHub Actions CI/CD Workflow
.github/workflows/harness.yaml
Complete pipeline: lint-and-test (Ruff, mypy, pytest), image-evals-pre-publish (local build/boot validation), publish-image (GHCR and Docker Hub with buildx, SLSA provenance, SBOM), distribution-evals (pull verification, attestation validation), security-scan (Trivy + SARIF), and summary (aggregated reporting).
Category A Evaluation Scripts
miot-harness/evals/deploy/01-image-builds.sh, 02-image-boots.sh, 03-image-runs-demo.sh
Scripts validate compressed image size budget, container boot and /health readiness, and end-to-end demo execution within running container.
Category C Distribution Evaluation Scripts
miot-harness/evals/deploy/05-pulls-from-ghcr.sh, 06-pulls-from-dockerhub.sh, 07-attestations-present.sh
Scripts verify anonymous pull from GHCR and Docker Hub and validate presence of SLSA provenance and SBOM attestations on published image digest.
Evaluation Orchestration and Documentation
miot-harness/evals/deploy/run-all.sh, README.md, B-checklist.md
run-all.sh orchestrates Category A scripts with pass/fail aggregation and cleanup; README documents deploy eval contracts and environment knobs; B-checklist provides manual review checklist for workflow shape validation.

Backend Nexo Integration and Health Endpoint

Layer / File(s) Summary
Configuration and Data Contracts
miot-harness/src/miot_harness/config.py, src/miot_harness/integrations/nexo/boot.py
HarnessSettings adds nexo_dsn (direct DSN override) and nexo_application_name fields. NexoBootResult includes snapshot_age_minutes to report database freshness state.
Credential Parsing and Connection Pool
miot-harness/src/miot_harness/integrations/nexo/credentials.py, src/miot_harness/integrations/nexo/pool.py
credentials.py adds quote-stripping helper to unwrap shell-quoted DSN values. create_nexo_pool now accepts optional dsn kwarg with precedence over credentials-derived DSN; validates that at least one source is provided.
API Health Endpoint and Lifespan
miot-harness/src/miot_harness/api/server.py
FastAPI lifespan conditionally boots Nexo based on DSN/credentials presence and stores snapshot_age_minutes. /health response expanded to include nexo sub-object with enabled, tools, and snapshot_age_minutes.
Failure Rendering
miot-harness/src/miot_harness/agents/synthesizer.py
_render_failure(reason) now categorizes failures by snapshot-stale prefix and returns freshness-retry advice (cheap path, no LLM) or neutral reformulation message to hide internal pipeline details.
Credential and Pool Tests
miot-harness/tests/integrations/nexo/test_credentials.py, tests/integrations/nexo/test_pool.py
Tests validate quote stripping (single/double/unbalanced), special characters in passwords, raw DSN parameter, DSN precedence, and validation when neither credentials nor DSN provided.
Health Endpoint Tests
miot-harness/tests/api/test_health.py
Tests verify /health response shape with Nexo disabled (default) and enabled states; confirms nexo.enabled, nexo.tools, and nexo.snapshot_age_minutes fields.
Synthesizer Failure Tests
miot-harness/tests/test_synthesizer.py
Tests clarify snapshot-stale path produces freshness advice without LLM call and verify non-snapshot failures omit internal pipeline details while maintaining neutral planning language and answer.completed event.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

The PR spans two largely independent DAGs: containerization/CI infrastructure (workflow, Dockerfile, eval scripts, orchestration) and backend Nexo integration (config, credentials, health, tests). The containerization cohort is extensive (7 new shell scripts, YAML workflow, docs, orchestration) with multiple evaluation categories and careful pass/fail semantics. The Nexo cohort modifies several interdependent modules (config → boot → credentials → pool → server → synthesizer) with careful credential handling and DSN precedence logic. Both cohorts include comprehensive tests. Heterogeneous changes across configs, scripts, backend logic, and tests demand separate reasoning for each area.

Possibly related PRs

  • microboxlabs/modulariot#445: Modifies the same Nexo/harness codepaths and symbols (synthesizer failure rendering, FastAPI server health/state, HarnessSettings fields, NexoBootResult, credentials parsing, create_nexo_pool signature, and related tests).
  • microboxlabs/modulariot#446: Both PRs modify overlapping code in src/miot_harness (notably api/server.py and related tests/settings handling).

Poem

🐰 A containerized burrow is born,
With Docker stages and evals in swarm,
Nexo freshness tracked with loving care,
SLSA proof floating through the air,
Health checks pulse at :8000's door—
The harness awakens, ready for more! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: containerizing miot-harness and integrating it into CI with Dockerfile, workflow, and deployment evals.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

CI run on PR microboxlabs#447 (cross-fork odtorres → microboxlabs) failed at
`Build & Publish Image` with:

  denied: installation not allowed to Create organization package

Cross-fork PR's GITHUB_TOKEN cannot create a NEW org package on the
first push. Same-repo PRs and trunk pushes have full perms and work.

Fix:
- `publish-image`: `push:` is now conditional on the PR being either
  same-repo OR not-a-PR. Cross-fork PRs build (so the lint + image-
  evals gates still test the diff) but skip the registry publish.
- `distribution-evals`: gated by the same condition. No image to
  verify on cross-fork PRs → job skips cleanly instead of failing
  on an empty digest.
- `summary`: renders 'skipped (fork PR)' for distribution-evals when
  the cross-fork condition trips, mirroring the existing 'skipped
  (PR)' rendering for security-scan.

Trade-off: cross-fork PRs no longer prove the publish path
end-to-end. The trunk-push event after merge runs the full pipeline,
so verification just shifts to a different temporal boundary. T10b's
acceptance is updated accordingly: the PR build proves lint + image-
evals; trunk-push will prove publish + distribution.

YAML re-validated (parses, all 6 jobs, conditions present on
publish-image and distribution-evals).
@korutx korutx marked this pull request as ready for review May 10, 2026 05:21

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
miot-harness/src/miot_harness/api/server.py (1)

65-94: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep fallback-to-disabled state internally consistent.

When graph/model wiring fails, Nexo is marked disabled but nexo_registered and nexo_pool are left as-if enabled. This can expose contradictory health state and hold unnecessary DB connections open.

🔧 Suggested fix
                 except Exception as exc:  # noqa: BLE001
                     logger.critical(
                         "Nexo: failed to build chat models / graph (%s); "
                         "falling back to Nexo disabled",
                         exc,
                     )
                     app.state.nexo_enabled = False
+                    app.state.nexo_registered = []
+                    app.state.nexo_pool = None
                     harness.nexo_graph = None
+                    if pool is not None:
+                        await pool.close()
+                        pool = None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@miot-harness/src/miot_harness/api/server.py` around lines 65 - 94, When
building the Nexo models/graph fails inside the try block, clear any state that
was set as-if Nexo succeeded: set app.state.nexo_enabled = False, set
harness.nexo_graph = None, unset/clear harness.nexo_registered (replace
result.registered) and release/clear the DB connection by closing and/or setting
app.state.nexo_pool = None (use pool.close()/await pool.close() if available).
Do this inside the except that catches Exception (the same block that currently
sets app.state.nexo_enabled = False) so the internal state (app.state.nexo_pool,
harness.nexo_registered, harness.nexo_graph) remains consistent when
build_nexo_graph / get_chat_model fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/harness.yaml:
- Around line 292-299: The workflow currently uses a mutable reference
aquasecurity/trivy-action@master; replace that with the full commit SHA of the
Trivy action to make the step immutable (e.g., uses:
aquasecurity/trivy-action@<full-commit-sha>) or at minimum a fixed tag like
`@v0.36.0`; update the uses line in the Run Trivy step (the uses:
aquasecurity/trivy-action@master entry) to the chosen full-length commit SHA and
commit the change so future runs reference an immutable action version.

In `@miot-harness/.env.example`:
- Around line 5-7: Update the .env.example to include placeholder
entries/comments for the new deploy knobs so the template matches
src/miot_harness/config.py: add lines for MIOT_HARNESS_NEXO_DSN,
MIOT_HARNESS_NEXO_APPLICATION_NAME, MIOT_HARNESS_LOG_LEVEL, and
MIOT_HARNESS_REQUEST_ID_HEADER (with brief comment describing expected
value/format and sensible default or "required" note) so operators can audit all
settings in one place; ensure variable names exactly match the symbols used in
the codebase.
- Around line 34-56: The default tenant and the hard tenant lock conflict:
update the env template so MIOT_HARNESS_NEXO_TENANT_LOCK matches
MIOT_HARNESS_DEFAULT_TENANT_ID (e.g., set
MIOT_HARNESS_NEXO_TENANT_LOCK=demo-tenant) or, alternatively, change
MIOT_HARNESS_DEFAULT_TENANT_ID to mintral and update the explanatory comment
accordingly so MIOT_HARNESS_DEFAULT_TENANT_ID and MIOT_HARNESS_NEXO_TENANT_LOCK
are consistent.

In `@miot-harness/docker-compose.yml`:
- Around line 38-46: The harness service in docker-compose.yml must add a host
mapping so host.docker.internal resolves on Linux; update the harness service
block (where MIOT_HARNESS_WORKSPACE_DIR is set and volumes are declared) to
include an extra_hosts entry mapping "host.docker.internal" to the host gateway
(e.g., "host.docker.internal:host-gateway") so MIOT_HARNESS_NEXO_DSN pointing at
host.docker.internal:6434 will work without --network host or manual host
changes.

In `@miot-harness/evals/deploy/03-image-runs-demo.sh`:
- Around line 62-68: The health-check polling loop using DEADLINE currently
falls through if /health never becomes ready; update the script so after the
loop you detect timeout and fail fast: after the while loop that polls
"http://localhost:${PORT}/health" (the block using DEADLINE and curl -fs
--max-time 2) check whether the last curl succeeded (or whether $(date +%s) is
>= DEADLINE) and if it timed out print a clear error (including the port) and
exit 1; apply the same change to the second identical polling block around lines
74-80 so CI job aborts immediately when readiness never succeeds.

In `@miot-harness/evals/deploy/08-tag-discipline.sh`:
- Around line 45-55: In the push case when BRANCH matches v* the EXPECTED_KEYS
array is too lax (only "sha-") so release-tag regressions can slip; update the
branch-v* branch handler to require the full-version and major.minor release
tags in addition to the sha tag by adding the corresponding expected key
patterns to EXPECTED_KEYS (refer to the push) case, the BRANCH variable check,
and the EXPECTED_KEYS symbol) and keep EXPECT_DOCKERHUB="yes".

In `@miot-harness/src/miot_harness/agents/synthesizer.py`:
- Around line 74-78: The user-facing string mixes Spanish with the raw English
`reason` (matched by `_SNAPSHOT_STALE_PREFIX`), so change the branch that
handles `if reason.startswith(_SNAPSHOT_STALE_PREFIX):` to produce a fully
Spanish message: parse out any age or numeric details from `reason` (e.g.,
extract the minutes) and format a Spanish sentence like "No puedo responder
ahora mismo: el snapshot tiene X minutos; vuelve a intentarlo cuando esté fresco
o contacta a operaciones." — keep the original `reason` logged for operators
(separate log call) rather than shown to end users so tests such as the one
referenced in test_synthesizer.py continue to reflect localized output.

In `@miot-harness/tests/api/test_health.py`:
- Around line 14-17: The fixture _clear_settings_cache currently only unsets
MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT; also ensure it unsets MIOT_HARNESS_NEXO_DSN
via monkeypatch.delenv("MIOT_HARNESS_NEXO_DSN", raising=False) so the
default-disabled assumption holds for tests, keeping the existing call to
get_settings.cache_clear() intact; update the fixture body where
monkeypatch.delenv is used to remove both env vars.

---

Outside diff comments:
In `@miot-harness/src/miot_harness/api/server.py`:
- Around line 65-94: When building the Nexo models/graph fails inside the try
block, clear any state that was set as-if Nexo succeeded: set
app.state.nexo_enabled = False, set harness.nexo_graph = None, unset/clear
harness.nexo_registered (replace result.registered) and release/clear the DB
connection by closing and/or setting app.state.nexo_pool = None (use
pool.close()/await pool.close() if available). Do this inside the except that
catches Exception (the same block that currently sets app.state.nexo_enabled =
False) so the internal state (app.state.nexo_pool, harness.nexo_registered,
harness.nexo_graph) remains consistent when build_nexo_graph / get_chat_model
fails.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df852f38-466c-4f52-87ad-d4aa04bbaaec

📥 Commits

Reviewing files that changed from the base of the PR and between 1fd0068 and 494c86c.

📒 Files selected for processing (27)
  • .github/workflows/harness.yaml
  • miot-harness/.dockerignore
  • miot-harness/.env.example
  • miot-harness/Dockerfile
  • miot-harness/docker-compose.yml
  • miot-harness/evals/deploy/01-image-builds.sh
  • miot-harness/evals/deploy/02-image-boots.sh
  • miot-harness/evals/deploy/03-image-runs-demo.sh
  • miot-harness/evals/deploy/04-workflow-shape.sh
  • miot-harness/evals/deploy/05-pulls-from-ghcr.sh
  • miot-harness/evals/deploy/06-pulls-from-dockerhub.sh
  • miot-harness/evals/deploy/07-attestations-present.sh
  • miot-harness/evals/deploy/08-tag-discipline.sh
  • miot-harness/evals/deploy/B-checklist.md
  • miot-harness/evals/deploy/README.md
  • miot-harness/evals/deploy/run-all.sh
  • miot-harness/src/miot_harness/agents/synthesizer.py
  • miot-harness/src/miot_harness/api/server.py
  • miot-harness/src/miot_harness/config.py
  • miot-harness/src/miot_harness/integrations/nexo/boot.py
  • miot-harness/src/miot_harness/integrations/nexo/credentials.py
  • miot-harness/src/miot_harness/integrations/nexo/pool.py
  • miot-harness/tests/api/__init__.py
  • miot-harness/tests/api/test_health.py
  • miot-harness/tests/integrations/nexo/test_credentials.py
  • miot-harness/tests/integrations/nexo/test_pool.py
  • miot-harness/tests/test_synthesizer.py

Comment thread .github/workflows/harness.yaml
Comment thread miot-harness/.env.example
Comment thread miot-harness/.env.example
Comment on lines +34 to +56
# Default tenant/user when a request omits them. Per project policy, real
# tenant context comes from authenticated server context, not from these.
MIOT_HARNESS_DEFAULT_TENANT_ID=demo-tenant
MIOT_HARNESS_DEFAULT_USER_ID=demo-user

# -----------------------------------------------------------------------------
# Nexo data integration (Coordinador / Mintral)
# -----------------------------------------------------------------------------
# Path to a local clone of the db-scripts repo. The harness reads
# `<root>/databases/<alias>/.env` for PG credentials at lifespan boot.
# Leave commented to disable the Nexo integration (harness still serves
# non-Nexo runs with mocked tools).
# MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT=/path/to/db-scripts

# Which alias under `db-scripts/databases/` to load. The harness expects a
# read-only `harness` PG role to exist in the target DB (seeded by
# `db-scripts/scripts/seed/create-harness-reader-role.sql`).
MIOT_HARNESS_NEXO_DB_ALIAS=coordinador-dev

# Hard tenant lock applied to every coordinador_* tool call. Must match
# MIOT_HARNESS_DEFAULT_TENANT_ID for end-to-end runs.
MIOT_HARNESS_NEXO_TENANT_LOCK=mintral

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Default tenant values conflict with the stated lock requirement.

The template says the Nexo tenant lock must match the default tenant for end-to-end runs, but the defaults differ. This is easy to trip over during first-time setup.

🔧 Suggested fix
-MIOT_HARNESS_DEFAULT_TENANT_ID=demo-tenant
+MIOT_HARNESS_DEFAULT_TENANT_ID=mintral

or set MIOT_HARNESS_NEXO_TENANT_LOCK=demo-tenant to keep the current default tenant.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Default tenant/user when a request omits them. Per project policy, real
# tenant context comes from authenticated server context, not from these.
MIOT_HARNESS_DEFAULT_TENANT_ID=demo-tenant
MIOT_HARNESS_DEFAULT_USER_ID=demo-user
# -----------------------------------------------------------------------------
# Nexo data integration (Coordinador / Mintral)
# -----------------------------------------------------------------------------
# Path to a local clone of the db-scripts repo. The harness reads
# `<root>/databases/<alias>/.env` for PG credentials at lifespan boot.
# Leave commented to disable the Nexo integration (harness still serves
# non-Nexo runs with mocked tools).
# MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT=/path/to/db-scripts
# Which alias under `db-scripts/databases/` to load. The harness expects a
# read-only `harness` PG role to exist in the target DB (seeded by
# `db-scripts/scripts/seed/create-harness-reader-role.sql`).
MIOT_HARNESS_NEXO_DB_ALIAS=coordinador-dev
# Hard tenant lock applied to every coordinador_* tool call. Must match
# MIOT_HARNESS_DEFAULT_TENANT_ID for end-to-end runs.
MIOT_HARNESS_NEXO_TENANT_LOCK=mintral
# Default tenant/user when a request omits them. Per project policy, real
# tenant context comes from authenticated server context, not from these.
MIOT_HARNESS_DEFAULT_TENANT_ID=mintral
MIOT_HARNESS_DEFAULT_USER_ID=demo-user
# -----------------------------------------------------------------------------
# Nexo data integration (Coordinador / Mintral)
# -----------------------------------------------------------------------------
# Path to a local clone of the db-scripts repo. The harness reads
# `<root>/databases/<alias>/.env` for PG credentials at lifespan boot.
# Leave commented to disable the Nexo integration (harness still serves
# non-Nexo runs with mocked tools).
# MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT=/path/to/db-scripts
# Which alias under `db-scripts/databases/` to load. The harness expects a
# read-only `harness` PG role to exist in the target DB (seeded by
# `db-scripts/scripts/seed/create-harness-reader-role.sql`).
MIOT_HARNESS_NEXO_DB_ALIAS=coordinador-dev
# Hard tenant lock applied to every coordinador_* tool call. Must match
# MIOT_HARNESS_DEFAULT_TENANT_ID for end-to-end runs.
MIOT_HARNESS_NEXO_TENANT_LOCK=mintral
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@miot-harness/.env.example` around lines 34 - 56, The default tenant and the
hard tenant lock conflict: update the env template so
MIOT_HARNESS_NEXO_TENANT_LOCK matches MIOT_HARNESS_DEFAULT_TENANT_ID (e.g., set
MIOT_HARNESS_NEXO_TENANT_LOCK=demo-tenant) or, alternatively, change
MIOT_HARNESS_DEFAULT_TENANT_ID to mintral and update the explanatory comment
accordingly so MIOT_HARNESS_DEFAULT_TENANT_ID and MIOT_HARNESS_NEXO_TENANT_LOCK
are consistent.

Comment thread miot-harness/docker-compose.yml
Comment thread miot-harness/evals/deploy/03-image-runs-demo.sh
Comment on lines +45 to +55
push)
if [[ "$BRANCH" == "trunk" || "$BRANCH" == "main" ]]; then
EXPECTED_KEYS=("latest" "sha-")
EXPECT_DOCKERHUB="yes"
elif [[ "$BRANCH" == v* ]]; then
EXPECTED_KEYS=("sha-") # plus full + major.minor — best-effort
EXPECT_DOCKERHUB="yes"
else
EXPECTED_KEYS=("sha-")
EXPECT_DOCKERHUB="no"
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Release-tag runs can pass without any release tags.

For push on v*, this only requires sha-, so a regression that drops <full version> or <major>.<minor> tags still passes. That leaves the release-tag policy effectively unchecked in the automated path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@miot-harness/evals/deploy/08-tag-discipline.sh` around lines 45 - 55, In the
push case when BRANCH matches v* the EXPECTED_KEYS array is too lax (only
"sha-") so release-tag regressions can slip; update the branch-v* branch handler
to require the full-version and major.minor release tags in addition to the sha
tag by adding the corresponding expected key patterns to EXPECTED_KEYS (refer to
the push) case, the BRANCH variable check, and the EXPECTED_KEYS symbol) and
keep EXPECT_DOCKERHUB="yes".

Comment thread miot-harness/src/miot_harness/agents/synthesizer.py
Comment thread miot-harness/tests/api/test_health.py
@korutx korutx force-pushed the harness-deploy-loop branch from f920ea1 to 76d18d4 Compare May 10, 2026 16:55
Per post-mortem on the deploy-eval scope: about half the scripts
were redundant with what CI naturally enforces, or asserted things
that re-stated config. Trimming before merge so the suite reflects
genuine load-bearing checks instead of paranoia.

Removed:
- 04-workflow-shape.sh — asserts the GHA workflow's job-set shape via
  `gh run view --json jobs`. The PR check UI already surfaces
  missing/failed jobs; a self-asserting script is paranoid and
  brittle (broke whenever job names were renamed).
- 08-tag-discipline.sh — asserts the tag pattern published by a run
  via run-log grep. The metadata-action config IS the tag spec; a
  script re-asserting it duplicates YAML in another language.
  Brittle (run-log substring parsing) and rarely catches anything.

Updated:
- run-all.sh: dropped the `--with-distribution` orchestration
  verbiage; orchestrator runs Category A only (01/02/03), Category C
  scripts (05/06/07) are invoked by CI's distribution-evals job
  with real digest args.
- harness.yaml: distribution-evals job dropped the `Tag discipline`
  step; now runs `Pull from GHCR` → `Verify attestations` → `Pull
  from Docker Hub` (non-PR only). RUN_ID env var no longer needed.
- README.md: unified table across Categories A and C; added a "what
  these evals do NOT cover" section explaining the dropped checks.
- B-checklist.md: replaced "verified by 08-tag-discipline" with a
  direct spot-check acknowledging metadata-action IS the spec. Added
  attestation negative-control reminder.

What stays (each earns its cost):
- 01: build + compressed-size budget.
- 02: container actually works at runtime — load-bearing.
- 03: optional, gated on API key — proves wires connect.
- 05/06: catches "push exited 0 but manifest didn't land" + Docker
  Hub PR-skip-guard regressions.
- 07: most valuable — negative-control for the supply-chain story.

Net: 9 → 7 scripts, ~150 lines removed.

Verification:
- bash miot-harness/evals/deploy/run-all.sh → 3 PASS lines
  (01 PASS, 02 PASS, 03 PASS-SKIPPED), unchanged.
- YAML parses; distribution-evals job 5 → 4 steps.
- Lint/types/tests unaffected.
@korutx korutx force-pushed the harness-deploy-loop branch from 76d18d4 to ac960bd Compare May 10, 2026 16:56

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@miot-harness/evals/deploy/run-all.sh`:
- Around line 38-47: The current pipeline uses bash "$HERE/$script" | tee | head
which can yield a non-zero pipeline exit (SIGPIPE) and wrongly run the else
branch; instead run the script and capture its full stdout+stderr into a
variable, record the script's real exit status via PIPESTATUS[0] (or
${PIPESTATUS[0]} immediately after the pipeline), then extract FIRST as the
first line from that captured output and use the recorded exit status to decide
the PASS/FAIL/SKIP logic (update places referencing FIRST, bash "$HERE/$script",
tee, head, and use PIPESTATUS to set FAIL/PASS/ SUMMARY accordingly).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d10267e9-ac51-4a86-833d-3458a1d804b2

📥 Commits

Reviewing files that changed from the base of the PR and between 494c86c and ac960bd.

📒 Files selected for processing (4)
  • .github/workflows/harness.yaml
  • miot-harness/evals/deploy/B-checklist.md
  • miot-harness/evals/deploy/README.md
  • miot-harness/evals/deploy/run-all.sh
✅ Files skipped from review due to trivial changes (1)
  • miot-harness/evals/deploy/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/harness.yaml

Comment on lines +38 to +47
if FIRST=$(bash "$HERE/$script" 2>&1 | tee /dev/stderr | head -n 1); then
case "$FIRST" in
"PASS"*"SKIPPED"*) SKIP=$((SKIP+1)); SUMMARY+=("⏭ $FIRST") ;;
"PASS"*) PASS=$((PASS+1)); SUMMARY+=("✅ $FIRST") ;;
*) FAIL=$((FAIL+1)); SUMMARY+=("❓ $FIRST") ;;
esac
else
FAIL=$((FAIL+1))
SUMMARY+=("❌ $script (exit non-zero)")
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/usr/bin/env bash
set -euo pipefail

# Repro: pipeline can fail despite successful producer due to head/pipe close.
if FIRST=$(bash -lc 'printf "PASS 01\nextra\n"; exit 0' 2>&1 | tee /dev/stderr | head -n 1); then
  echo "IF branch taken: $FIRST"
else
  echo "ELSE branch taken (unexpected for a passing producer)"
fi

Repository: microboxlabs/modulariot

Length of output: 181


🏁 Script executed:

# Check if the file exists and inspect it
fd -type f -name "run-all.sh" | head -20

Repository: microboxlabs/modulariot

Length of output: 238


🏁 Script executed:

# Correct syntax to find the file
fd "run-all.sh"

Repository: microboxlabs/modulariot

Length of output: 104


🏁 Script executed:

# Read the file to inspect it
cat -n miot-harness/evals/deploy/run-all.sh

Repository: microboxlabs/modulariot

Length of output: 2658


pipefail + tee|head can misclassify passing evals as failures.

On line 38, a script that prints more than one line may trip a non-zero pipeline status (SIGPIPE path) even when the eval itself passes, triggering the else branch incorrectly.

Suggested fix
-  if FIRST=$(bash "$HERE/$script" 2>&1 | tee /dev/stderr | head -n 1); then
+  OUTPUT=""
+  if OUTPUT=$(bash "$HERE/$script" 2>&1); then
+    printf '%s\n' "$OUTPUT" >&2
+    FIRST=$(printf '%s\n' "$OUTPUT" | head -n 1)
     case "$FIRST" in
       "PASS"*"SKIPPED"*) SKIP=$((SKIP+1)); SUMMARY+=("⏭  $FIRST") ;;
       "PASS"*)           PASS=$((PASS+1)); SUMMARY+=("✅ $FIRST") ;;
       *)                 FAIL=$((FAIL+1)); SUMMARY+=("❓ $FIRST") ;;
     esac
   else
+    printf '%s\n' "$OUTPUT" >&2
     FAIL=$((FAIL+1))
     SUMMARY+=("❌ $script (exit non-zero)")
   fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@miot-harness/evals/deploy/run-all.sh` around lines 38 - 47, The current
pipeline uses bash "$HERE/$script" | tee | head which can yield a non-zero
pipeline exit (SIGPIPE) and wrongly run the else branch; instead run the script
and capture its full stdout+stderr into a variable, record the script's real
exit status via PIPESTATUS[0] (or ${PIPESTATUS[0]} immediately after the
pipeline), then extract FIRST as the first line from that captured output and
use the recorded exit status to decide the PASS/FAIL/SKIP logic (update places
referencing FIRST, bash "$HERE/$script", tee, head, and use PIPESTATUS to set
FAIL/PASS/ SUMMARY accordingly).

Triaged each finding against current code; fixed the still-valid ones,
skipped two with reason. All gates green (ruff, mypy 51 files, pytest
151 — local failures are .env-leak from the dogfood session, not from
these edits; verified with .env moved aside).

Fixes:

1. .github/workflows/harness.yaml — pin trivy-action by SHA
   (`@c1824fd6...# v0.34.0`) instead of mutable `@master`. Matches the
   pattern already used in `ci.yaml` for reproducibility / supply-chain.

2. miot-harness/.env.example — add the four T05 settings that should
   have been there from the start: MIOT_HARNESS_NEXO_DSN (commented,
   placeholder DSN), MIOT_HARNESS_NEXO_APPLICATION_NAME=miot-harness,
   MIOT_HARNESS_LOG_LEVEL=INFO, MIOT_HARNESS_REQUEST_ID_HEADER=
   x-request-id. Each with a comment matching the source-of-truth in
   config.py.

4. miot-harness/docker-compose.yml — add `extra_hosts: ["host.docker.
   internal:host-gateway"]` to the `harness` service. macOS/Windows
   already resolve this name automatically; the directive is a no-op
   there. On Linux it's required for MIOT_HARNESS_NEXO_DSN pointing at
   `host.docker.internal:6434` (a host-side kubectl port-forward) to
   work without `--network host`.

5. miot-harness/evals/deploy/03-image-runs-demo.sh — fail-fast on
   /health timeout. Previously the polling loop just `break`d on
   timeout and proceeded to `docker exec`, producing a confusing
   "demo command exited 1" message. Now it sets a READY flag, exits
   `1` with a clear error including the port if /health never came up.
   (Reviewer flagged a "second polling block at 74-80" — that's not a
   poll, it's the docker exec; only one poll exists, only one fix.)

7. miot-harness/src/miot_harness/agents/synthesizer.py — render
   stale-snapshot refusal fully in Spanish. The previous code embedded
   the English `reason` ("Coordinador snapshot is stale (age 4320min
   > refuse threshold 240min).") inside a Spanish frame, mixing
   languages for end users. Now we parse the age via regex
   `\(age\s*(\d+)\s*min` and render "el snapshot tiene <X> minutos…";
   fall back to a generic Spanish message if the regex doesn't match
   (defends against upstream format changes). Internal reason is NOT
   shown to the user; freshness_judge already logs it for operators.
   Existing test passes via "snapshot" loanword.

8. miot-harness/tests/api/test_health.py — fixture also delenvs
   MIOT_HARNESS_NEXO_DSN. After T05 added the DSN bypass, the lifespan
   only short-circuits to "Nexo disabled" if BOTH NEXO_DSN and
   NEXO_DB_SCRIPTS_ROOT are unset. Without this delenv, an operator
   with NEXO_DSN in their .env would have these tests try to connect.

9. miot-harness/src/miot_harness/api/server.py — clear app.state.nexo_*
   public state when the Nexo graph build fails. Previously the inner
   except cleared `nexo_enabled = False` and `nexo_graph = None`, but
   left `app.state.nexo_pool`, `app.state.nexo_registered`, and
   `app.state.nexo_snapshot_age_minutes` populated. /health would
   then report a misleading mix: enabled=False but tools=[N names].
   Now all four public fields reset together. The pool itself is
   still closed by the outer `finally`; we just drop the public ref.

Skipped:

3. .env.example tenant_lock vs default_tenant_id mismatch — the
   current values match `config.py` defaults exactly
   (default_tenant_id=demo-tenant, nexo_tenant_lock=mintral); the
   comment already explains they should match for end-to-end runs.
   Aligning them in the template would diverge the .env.example from
   config.py defaults — strictly worse. Operators set both via their
   real .env, where they MUST match anyway.

6. 08-tag-discipline.sh tightening — file deleted in commit
   ac960bd during the deploy-eval scope trim (review feedback that
   asserted things which re-stated config). The script no longer
   exists; the review suggestion is moot.

Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q (with worktree's dogfood .env aside) → 151 passed.
- bash miot-harness/evals/deploy/run-all.sh → 3 PASS lines unchanged.
- YAML parse OK; trivy step now references the pinned SHA.
@korutx korutx merged commit 8ee7753 into microboxlabs:trunk May 10, 2026
7 checks passed
@korutx korutx deleted the harness-deploy-loop branch May 10, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant