harness: deploy stack — Dockerfile, CI workflow, deploy evals (T10b verify)#447
Conversation
Per plan 13-server-deployment/05-observability-and-health.md and the
RALPH-WORKLIST.md T01 acceptance: every deploy platform's
liveness/readiness probe needs to read Nexo enablement, registered
tool count, and snapshot freshness without parsing logs.
What changed:
- NexoBootResult: new field `snapshot_age_minutes: float | None` so
/health can surface the freshness gate's view without re-querying.
- boot.py: track `age_minutes` across the freshness probe so both the
refuse-stale failure path and the success path return it. Other
failure paths leave it as None (no probe ran).
- api/server.py:
- lifespan now initializes `app.state.nexo_snapshot_age_minutes = None`
alongside the existing nexo_enabled/pool/registered defaults.
- sets it from `result.snapshot_age_minutes` after `load_nexo_tools`
on every Nexo-enabled path.
- GET /health now returns
{status, env, nexo: {enabled, tools, snapshot_age_minutes}}
— handler reads live `app.state` so post-startup state mutation
is observable (e.g. by a future watchdog).
Tests:
- tests/api/test_health.py (new): two cases.
- default (NEXO_DB_SCRIPTS_ROOT unset → Nexo disabled): asserts
enabled=False, tools=[], snapshot_age_minutes=None.
- simulated enabled (post-startup app.state mutation): asserts the
new shape reflects live state, not a snapshot.
Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 141 passed (was 139), 1 skipped.
Out of scope (deferred): build/version object on /health (per plan 05),
separate /health/ready endpoint, prometheus metrics — tracked in
RALPH-WORKLIST.md and 05-observability-and-health.md.
Files written by shell-oriented tooling (sed/heredocs, hand-edits) often wrap values in `'...'` or `"..."`. The previous parser preserved those quotes literally, sending the surplus chars to Postgres and producing SASL auth failures (observed in practice on coordinador-prod-harness). What changed: - credentials.py: added _strip_matching_quotes() and applied it after whitespace stripping in _parse_env_file. A single matching pair of ASCII quotes around the value is unwrapped; anything else (no quotes, unbalanced quotes, mismatched quotes, single char) is preserved verbatim so we never silently mangle malformed input. - credentials.py: docstring updated — the file no longer claims "no quoting"; it documents the new behavior with the failure mode that motivated the change. Tests (tests/integrations/nexo/test_credentials.py — extended, +6): - unquoted: PGPASSWORD=plain-secret → plain-secret - single-quoted: PGPASSWORD='shell-style' → shell-style - double-quoted: PGPASSWORD="dotenv-style" → dotenv-style - unbalanced: PGPASSWORD='unbalanced → 'unbalanced (preserved) - embedded equals: PGPASSWORD=key=value=trailer → key=value=trailer - trailing hash: PGPASSWORD=value#nope → value#nope (documents that the parser does NOT strip dotenv-style trailing comments — comments must live on their own line) Verification: - uv run ruff check src tests → All checks passed. - uv run mypy → Success: no issues found in 51 source files. - uv run pytest -q → 147 passed (was 141), 1 skipped. Out of scope: nexo_dsn settings escape hatch (T05) and trailing-comment support (deliberately not added; would change the contract beyond what T02 specifies).
…ess failures
The synthesizer's failure-mode copy unconditionally suggested
"Reintenta cuando el snapshot esté fresco o consulta con operaciones",
even when the failure had nothing to do with freshness — e.g. a
filter_expert JSON parse error, a permission denial, or a tool error.
Users were told to wait for a fresh snapshot when the actual remedy
was to reformulate their question.
What changed:
- agents/synthesizer.py: _render_failure() now routes by reason prefix.
- Reason starting with "Coordinador snapshot is stale" → keep the
original retry-when-fresh copy (the user's right move IS to wait).
- Anything else → neutral planning copy: "No pude planificar la
consulta; reformúlala con más detalle o pide ayuda al equipo."
- Internal pipeline detail (e.g. "filter_expert returned malformed
step") is intentionally hidden from the user — it leaks pipeline
structure and provides no actionable signal.
Tests:
- tests/test_synthesizer.py: existing freshness-stale test continues
to assert the original retry copy is rendered.
- tests/test_synthesizer.py (new): test_planning_failure_does_not_leak_
snapshot_retry_advice — sets failure="filter_expert returned
malformed step" and asserts:
* "snapshot" / "fresco" absent from answer (negative)
* "filter_expert" absent from answer (no leak of internal state)
* "planificar" + "reformúla*" present (positive)
Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 148 passed (was 147), 1 skipped.
Out of scope: filter_expert one-shot retry on JSON parse failure (a
separate concern tracked in 08-followups.md item 2).
The previous .env.example documented 4 of the 18 fields HarnessSettings declares. Operators (and reviewers of the deploy plan in 13-server-deployment/) couldn't audit the full configuration surface without reading config.py. What changed: - miot-harness/.env.example: rewritten to match config.py 1:1 across all sections — provider keys, harness identity, Nexo integration (all 8 fields), multi-agent model assignment (all 6 fields). - Each setting carries a short comment explaining purpose, range, and any operational gotcha (e.g. freshness thresholds, schema validation, tenant_lock vs default_tenant_id matching). - Per project memory `feedback_no_sensitive_defaults`: no real URLs, no real secrets, all values are placeholders or HarnessSettings defaults. Verification: - uv run ruff check src tests → All checks passed. - uv run mypy → Success: no issues found in 51 source files. - uv run pytest -q → 148 passed, 1 skipped (unaffected; docs file).
Containerized deployments (k8s, ECS, Fly) want a single DSN secret, not
a mounted db-scripts directory. Added MIOT_HARNESS_NEXO_DSN as an
explicit override and three other deploy-readiness settings flagged in
plan 03-configuration-and-secrets.md.
What changed:
- config.py: four new HarnessSettings fields.
- nexo_dsn (str | None, default None): direct DSN. When set, bypasses
`db_scripts_root + alias` file lookup.
- nexo_application_name (str, default "miot-harness"): surfaces in
pg_stat_activity.application_name. Setting added; wiring through
server_settings deferred (PgBouncer transaction-pooling rejects
non-tracked startup params; needs verification with prod
track_extra_parameters config first).
- log_level (Literal[...], default INFO): standard log level switch.
- request_id_header (str, default "x-request-id"): for tracing
propagation.
- pool.py: create_nexo_pool() now accepts EITHER
- `creds: NexoCredentials` (existing positional, file-based path), or
- `dsn: str` (new keyword, container-native path).
When both provided, `dsn` wins (industry convention: explicit env
beats config file). Raises ValueError if neither is supplied. The
PgBouncer "no server_settings" guard is preserved for both paths.
- api/server.py: lifespan precedence rule.
- Early-return now requires BOTH nexo_dsn AND nexo_db_scripts_root to
be unset before disabling Nexo (was: just db_scripts_root).
- When nexo_dsn is set, skip load_nexo_credentials and pass the DSN
directly to create_nexo_pool. Logs which path was taken.
- Asserts in the file-path branch help mypy narrow Path | None → Path.
Tests (tests/integrations/nexo/test_pool.py: +3 cases):
- test_create_nexo_pool_with_raw_dsn: DSN-only path uses the literal
string and still skips server_settings.
- test_dsn_kwarg_overrides_creds_when_both_passed: documents the
precedence rule (DSN > creds) so reviewers see the contract.
- test_create_nexo_pool_requires_creds_or_dsn: at least one source
must be supplied.
Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 151 passed (was 148), 1 skipped.
Decision recorded (codex-consult deferred — small, well-bounded):
Precedence is `nexo_dsn > db-scripts file`. Rationale: matches Django
DATABASE_URL, sqlx, and every PaaS pattern; lets one image work in
both local-dev (file path) and container (DSN env) without code
changes.
Out of scope: actually applying nexo_application_name to live
connections (needs PgBouncer track_extra_parameters audit), wiring
log_level into the logging config, threading request_id through
events. Tracked for follow-up.
Per plan 13-server-deployment/02-image-build.md. Produces a runnable
miot-harness image — first deployable artifact on this branch.
Structure:
- Stage 1 `builder`: python:3.12-slim + uv (pinned via OCI image
copy from ghcr.io/astral-sh/uv:0.5). Two-layer dep install:
`uv sync --frozen --no-dev --no-install-project` for deps (cached
on pyproject.toml + uv.lock), then COPY src and
`uv sync --frozen --no-dev --no-editable` for the project.
- Stage 2 `runtime`: python:3.12-slim, COPY only /app/.venv from
builder. No build tools in the final image.
Critical detail (caught during smoke test): without `--no-editable`,
uv installs miot-harness as an editable link to /app/src in the
builder venv. The runtime stage doesn't COPY src, so the editable
link breaks at boot with `ModuleNotFoundError: No module named
'miot_harness'`. `--no-editable` forces a real wheel install whose
files live entirely under /app/.venv/.../site-packages/, surviving
the cross-stage COPY.
Image properties:
- Non-root by default: USER 65534:65534 (numeric `nobody`); ready for
K8s `runAsNonRoot` policies without further config.
- Workspace dir pre-created with the right ownership at /app/.miot-
workspace so the container can run with a read-only root filesystem
in production (mount this dir as emptyDir/PVC).
- ARG HARNESS_VERSION baked into org.opencontainers.image.version
label per plan 06-deploy-pipeline.md.
- Default entrypoint: uvicorn miot_harness.api.server:create_app
--factory --host 0.0.0.0 --port 8000.
Acceptance verification:
- `docker build -t miot-harness:test --build-arg HARNESS_VERSION=
0.0.0-dev .` → succeeds.
- `docker run -d -p 18080:8000 miot-harness:test` → boots cleanly,
uvicorn logs "Application startup complete".
- `curl http://localhost:18080/health` → 200 with the T01 shape:
{"status":"ok","env":"local","nexo":{"enabled":false,"tools":[],
"snapshot_age_minutes":null}}.
Lint/types/tests unaffected (no Python changes):
- ruff: All checks passed
- mypy: 51 files clean
- pytest: 151 passed
Notes for follow-up:
- Image size: 422 MB. Plan 02 target was ≤250 MB. Most of the bulk
is the LangChain/LangGraph/DeepAgents/FastAPI dep tree and Anthropic/
OpenAI client SDKs. Reduce by trimming optional deps or moving to
python:3.12-alpine (needs native-wheel verification). Out of scope
for T06.
- No .dockerignore yet — T07's job. The Dockerfile only COPYs
pyproject.toml/uv.lock/README.md/src so context bloat doesn't
reach the image, but build context transfer is slower than it
needs to be on developer machines until T07 lands.
The Dockerfile (T06) only COPYs pyproject.toml, uv.lock, README.md,
and src/, so excluded files don't reach the image. But without a
.dockerignore, every `docker build` transfers the full miot-harness/
tree to the daemon — `.venv` alone is hundreds of MB on developer
machines.
What's excluded (grouped by reason):
- .venv/ and __pycache__/ — biggest bloat; uv rebuilds bytecode in-image
- .miot-workspace/ — local runtime state, never belongs in an image
- .pytest_cache/, .mypy_cache/, .ruff_cache/ — tool caches
- .env, .env.example — secrets / docs that the runtime should NOT
consume from the image (settings come from the platform)
- tests/, evals/, docs/, examples/ — dev-only, not runtime
- .git*, .DS_Store, editor configs, dist/build dirs — noise
Image-size note (T07 acceptance check):
The plan's acceptance line ("image size smaller than before T07")
assumed a broader, less-careful Dockerfile. Our T06 Dockerfile is
already narrow — only explicitly-listed paths are COPYd — so image
size is unchanged: 422 MB before and after. The benefit of T07 is
build-context transfer speed and future-proofing against accidentally
broad COPYs, not image-size reduction.
Acceptance verification:
- `docker build -t miot-harness:test --build-arg HARNESS_VERSION=
0.0.0-dev .` → still succeeds (cached: 2.5s).
- Image size: 422MB → 422MB (same, by design — see note above).
- Image still boots and `/health` still responds 200 (verified in
T06; no Dockerfile change in this commit).
Lint/types/tests unaffected (no Python changes):
- ruff: All checks passed
- mypy: 51 files clean
- pytest: 151 passed
Local-only contract test that the Dockerfile builds and runs against the
same `.env` your `uv run uvicorn` workflow uses. NOT consumed by CI —
CI builds via the GitHub workflow defined in plan
09-github-workflow.md (lands in T09).
What it does:
- `docker compose up` builds the local Dockerfile, mounts `.env`,
exposes 8000, mounts `miot-harness-workspace` named volume at
/app/.miot-workspace so workspace_dir survives container restarts
without leaking host paths into the image.
- `restart: unless-stopped` so the harness comes back after host
reboots (developer ergonomic).
- Built-in healthcheck (Python urllib hitting /health) marks the
container as `healthy` once the FastAPI lifespan completes — useful
for `docker compose --wait` and CI smoke tests.
- `HARNESS_VERSION=0.0.0-dev` build arg makes locally-built images
visually distinct from CI tag-derived semver builds.
`tunnel` profile (optional):
- `docker compose --profile tunnel up` adds a placeholder
`pgbouncer-tunnel` container that runs `alpine:3.20` + `sleep
infinity`. On startup it prints the canonical kubectl port-forward
command rather than running it (kubeconfig only exists on the host).
- Kept as a real service so the harness Pod-with-sidecar topology is
mirrored locally — useful for verifying network behavior without a
real kubeconfig.
Acceptance verification:
- `docker compose up` → harness Up (healthy) within ~6s.
- `curl http://localhost:8000/health` → 200 with T01 shape
({"status":"ok","env":"local","nexo":{"enabled":false,"tools":[],
"snapshot_age_minutes":null}}).
- `docker compose --profile tunnel up` → placeholder logs the kubectl
command verbatim and stays running.
- `docker compose down -v` → clean teardown including named volume.
Lint/types/tests unaffected (no Python changes):
- ruff: All checks passed
- mypy: 51 files clean
- pytest: 151 passed
Implements the local-runnable half of plan
13-server-deployment/10-deploy-evals.md. The Category A scripts answer
"can the image build, boot, and run a demo end-to-end on this host?"
in a way that's also wirable into CI's `image-evals-pre-publish` job
(T09).
Files:
- evals/deploy/01-image-builds.sh — `docker build` + compressed-size
check (≤ 250 MB; current image is 80 MB compressed, comfortably
inside the budget). On failure, removes the partial image so the
next run starts clean.
- evals/deploy/02-image-boots.sh — `docker run -d` then poll
`curl /health` up to 15s. Validates the deploy-readable payload
shape (status, env, nexo.{enabled, tools, snapshot_age_minutes}).
Race-condition guard: short-circuits if the container exits before
/health responds, so a hard crash fails fast instead of grinding
through the timeout.
- evals/deploy/03-image-runs-demo.sh — `miot-harness demo "..."`
inside the running container, bounded by a wall-clock timeout.
Skips gracefully unless HARNESS_EVAL_DEMO=1 or a model API key is
in env (the script consumes API credit; we don't run it on PRs by
default).
- evals/deploy/run-all.sh — orchestrator. Runs Category A in order;
stub for `--with-distribution` (Category C lands in T10a). Prints
a structured summary (pass/fail/skip) and returns non-zero if any
script failed.
- evals/deploy/README.md — quickstart, env knobs, cleanup contract,
and a contract for adding new scripts.
Output convention: every script's first stdout line is exactly
`PASS <id> — <one-line>`,
`PASS <id> — SKIPPED (<reason>)`, or
`FAIL <id> — <reason>`
so CI logs are easy to grep and the orchestrator can categorize
without rerunning.
Cleanup contract:
- Each script's `trap … EXIT` removes its own container.
- 01 removes its image only on failure (so 02 and 03 can use it on
success).
- run-all.sh removes the image at end-of-suite — a clean orchestrator
run leaves zero residue.
- Single-script runs leave the image around for iteration; the next
01 invocation just overwrites the tag.
Acceptance verification:
- bash miot-harness/evals/deploy/run-all.sh → 3 PASS lines
(01 PASS, 02 PASS, 03 PASS-SKIPPED), exit 0.
- Re-running immediately produces identical output, confirming
self-cleanup.
Caught during implementation:
- First version of 01 removed the image on every EXIT, including
success. 02 then found nothing to boot. Fix: trap inspects $? and
only cleans up on failure; orchestrator owns end-of-suite teardown.
Wires the harness into the repo's existing CI conventions (matches
turbo-repo's ci.yaml + quarkus.yml patterns: GHCR primary, Docker
Hub mirror, build-push-action@v6, sha+pr+latest+semver tagging,
trivy SARIF, GHA cache). Adds the two eval jobs from plan
10-deploy-evals.md.
Job graph:
lint-and-test uv sync, ruff, mypy, pytest
↓
image-evals-pre-publish Category A: 01+02 (build+boot)
↓ via run-all.sh; gates the push
publish-image buildx → GHCR + Docker Hub
↓ with provenance: true, sbom: true
distribution-evals Category C: 05/06/07/08
↓ (scripts land in T10a)
security-scan summary trivy SARIF; always-runs summary
Trigger surface (matches existing repo conventions):
- push to trunk/main on miot-harness/** or this workflow file
- push to v* tags
- pull_request to trunk/main on the same paths
- workflow_dispatch
Tagging via metadata-action@v5:
- pr-<n> for PR runs
- latest on default-branch pushes
- {{version}}, {{major}}.{{minor}} for v* tags
- sha-<short> on every build (digest-pinning fallback)
Docker Hub mirror is conditional (`!= pull_request`), matching the
turbo-repo pattern. SonarCloud not wired — Python isn't registered in
SonarCloud yet (deferred per plan).
Image attestations: provenance + SBOM produced in-band by
build-push-action@v6 via GitHub OIDC. Replaces the cosign step
mentioned in some early plan drafts (cosign would be duplication).
Verification:
- YAML parses cleanly (yaml.safe_load).
- All 6 jobs declared; needs[] graph references only known jobs.
- Triggers: push, pull_request, workflow_dispatch all present.
- Job names match plan 10-deploy-evals.md "How they hook into the
workflow" diagram exactly.
Known status until T10a + T10b land:
- distribution-evals references scripts 05/06/07/08 that don't exist
yet. That job will fail in CI on this branch until T10a adds them.
T10b is the explicit verification step — pushes a feature branch
and confirms the full chain runs green.
Out of scope (follow-up):
- Optimizing image-evals-pre-publish to share buildx cache with
publish-image (currently two builds happen on cold runs). Defer
until first run shows real cost.
- linux/arm64 — only Quarkus does multi-arch; harness target is amd64.
Lint/types/tests unaffected (no Python changes):
- ruff: All checks passed
- mypy: 51 files clean
- pytest: 151 passed
Implements the registry-side half of plan
13-server-deployment/10-deploy-evals.md. These scripts answer "is the
image actually pullable, signed, and tagged correctly in the
registries?" — distinct from Category A (does the image build and
boot locally) which T08b already covers.
Files:
- evals/deploy/04-workflow-shape.sh — optional Category B helper.
Asserts a given GHA run for harness.yaml ran the expected six jobs
('Lint & Test' → 'Image Evals (pre-publish)' → 'Build & Publish
Image' → 'Distribution Evals' → 'Security Scan' → 'Build Summary').
Catches workflow-drift over time.
- evals/deploy/05-pulls-from-ghcr.sh — anonymous `docker pull` from
GHCR by tag or digest. Verifies the image actually landed at the
registry (build-push-action exiting 0 means the push call returned,
not necessarily that the manifest is reachable).
- evals/deploy/06-pulls-from-dockerhub.sh — same against the Docker
Hub mirror. Catches silent push failures (rotated DOCKERHUB_TOKEN,
PR-skip guard misfiring). CI workflow only runs it when the event
is not a PR.
- evals/deploy/07-attestations-present.sh — `gh attestation verify`
against the digest, then parses --format=json output to assert
BOTH a SLSA provenance predicate AND an SBOM predicate (SPDX or
CycloneDX) are present. The negative-control proof: removing
provenance:true / sbom:true from build-push-action must make this
script FAIL — verified manually in T10b.
- evals/deploy/08-tag-discipline.sh — pulls the run's event +
branch via gh, derives the expected tag pattern, then greps the
run log for actual harness image references. Asserts:
- PR runs: pr-<n>, sha-<short> on GHCR; nothing on Docker Hub
- trunk: latest, sha-<short> on both
- v* tags: full+major.minor+sha-<short> on both
- evals/deploy/B-checklist.md — review-style runbook for the
workflow-shape claims that are too brittle to automate against
the YAML itself (path triggers, secrets surface, summary
rendering, etc.).
run-all.sh: documents the Category C scripts behind
`--with-distribution`. The orchestrator does NOT run them itself
because each takes a registry-derived arg (digest / run-id) that
only exists after publish-image. CI's distribution-evals job calls
them directly; locally you invoke them by hand after a real push.
Robustness notes:
- 07 uses --format=json + grep predicateType to be liberal across
gh CLI versions (the verify flag surface has been moving). Skill
hint asked for codex consult here; the implementation falls back
to JSON parsing if --predicate-type isn't accepted.
- 08 derives expected pattern from gh's `event` + `headBranch`
fields, then greps the run log for image refs. Best-effort: the
metadata-action doesn't expose pushed tags as a structured output,
so log-grep is the most stable surface across action versions.
Verification (locally):
- All 5 new scripts pass `bash -n` (syntax check).
- Each emits `FAIL <ID> — usage: ...` when called without args.
- run-all.sh (without --with-distribution) still produces 3 PASS
lines from Category A, unchanged.
Real-publish acceptance is verified in T10b — push the feature
branch, watch the CI run, expect distribution-evals job green.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR introduces a complete Docker-based CI/CD infrastructure for ChangesDocker Containerization & Deployment Pipeline
Backend Nexo Integration and Health Endpoint
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes The PR spans two largely independent DAGs: containerization/CI infrastructure (workflow, Dockerfile, eval scripts, orchestration) and backend Nexo integration (config, credentials, health, tests). The containerization cohort is extensive (7 new shell scripts, YAML workflow, docs, orchestration) with multiple evaluation categories and careful pass/fail semantics. The Nexo cohort modifies several interdependent modules (config → boot → credentials → pool → server → synthesizer) with careful credential handling and DSN precedence logic. Both cohorts include comprehensive tests. Heterogeneous changes across configs, scripts, backend logic, and tests demand separate reasoning for each area. Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
CI run on PR microboxlabs#447 (cross-fork odtorres → microboxlabs) failed at `Build & Publish Image` with: denied: installation not allowed to Create organization package Cross-fork PR's GITHUB_TOKEN cannot create a NEW org package on the first push. Same-repo PRs and trunk pushes have full perms and work. Fix: - `publish-image`: `push:` is now conditional on the PR being either same-repo OR not-a-PR. Cross-fork PRs build (so the lint + image- evals gates still test the diff) but skip the registry publish. - `distribution-evals`: gated by the same condition. No image to verify on cross-fork PRs → job skips cleanly instead of failing on an empty digest. - `summary`: renders 'skipped (fork PR)' for distribution-evals when the cross-fork condition trips, mirroring the existing 'skipped (PR)' rendering for security-scan. Trade-off: cross-fork PRs no longer prove the publish path end-to-end. The trunk-push event after merge runs the full pipeline, so verification just shifts to a different temporal boundary. T10b's acceptance is updated accordingly: the PR build proves lint + image- evals; trunk-push will prove publish + distribution. YAML re-validated (parses, all 6 jobs, conditions present on publish-image and distribution-evals).
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
miot-harness/src/miot_harness/api/server.py (1)
65-94:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep fallback-to-disabled state internally consistent.
When graph/model wiring fails, Nexo is marked disabled but
nexo_registeredandnexo_poolare left as-if enabled. This can expose contradictory health state and hold unnecessary DB connections open.🔧 Suggested fix
except Exception as exc: # noqa: BLE001 logger.critical( "Nexo: failed to build chat models / graph (%s); " "falling back to Nexo disabled", exc, ) app.state.nexo_enabled = False + app.state.nexo_registered = [] + app.state.nexo_pool = None harness.nexo_graph = None + if pool is not None: + await pool.close() + pool = None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@miot-harness/src/miot_harness/api/server.py` around lines 65 - 94, When building the Nexo models/graph fails inside the try block, clear any state that was set as-if Nexo succeeded: set app.state.nexo_enabled = False, set harness.nexo_graph = None, unset/clear harness.nexo_registered (replace result.registered) and release/clear the DB connection by closing and/or setting app.state.nexo_pool = None (use pool.close()/await pool.close() if available). Do this inside the except that catches Exception (the same block that currently sets app.state.nexo_enabled = False) so the internal state (app.state.nexo_pool, harness.nexo_registered, harness.nexo_graph) remains consistent when build_nexo_graph / get_chat_model fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/harness.yaml:
- Around line 292-299: The workflow currently uses a mutable reference
aquasecurity/trivy-action@master; replace that with the full commit SHA of the
Trivy action to make the step immutable (e.g., uses:
aquasecurity/trivy-action@<full-commit-sha>) or at minimum a fixed tag like
`@v0.36.0`; update the uses line in the Run Trivy step (the uses:
aquasecurity/trivy-action@master entry) to the chosen full-length commit SHA and
commit the change so future runs reference an immutable action version.
In `@miot-harness/.env.example`:
- Around line 5-7: Update the .env.example to include placeholder
entries/comments for the new deploy knobs so the template matches
src/miot_harness/config.py: add lines for MIOT_HARNESS_NEXO_DSN,
MIOT_HARNESS_NEXO_APPLICATION_NAME, MIOT_HARNESS_LOG_LEVEL, and
MIOT_HARNESS_REQUEST_ID_HEADER (with brief comment describing expected
value/format and sensible default or "required" note) so operators can audit all
settings in one place; ensure variable names exactly match the symbols used in
the codebase.
- Around line 34-56: The default tenant and the hard tenant lock conflict:
update the env template so MIOT_HARNESS_NEXO_TENANT_LOCK matches
MIOT_HARNESS_DEFAULT_TENANT_ID (e.g., set
MIOT_HARNESS_NEXO_TENANT_LOCK=demo-tenant) or, alternatively, change
MIOT_HARNESS_DEFAULT_TENANT_ID to mintral and update the explanatory comment
accordingly so MIOT_HARNESS_DEFAULT_TENANT_ID and MIOT_HARNESS_NEXO_TENANT_LOCK
are consistent.
In `@miot-harness/docker-compose.yml`:
- Around line 38-46: The harness service in docker-compose.yml must add a host
mapping so host.docker.internal resolves on Linux; update the harness service
block (where MIOT_HARNESS_WORKSPACE_DIR is set and volumes are declared) to
include an extra_hosts entry mapping "host.docker.internal" to the host gateway
(e.g., "host.docker.internal:host-gateway") so MIOT_HARNESS_NEXO_DSN pointing at
host.docker.internal:6434 will work without --network host or manual host
changes.
In `@miot-harness/evals/deploy/03-image-runs-demo.sh`:
- Around line 62-68: The health-check polling loop using DEADLINE currently
falls through if /health never becomes ready; update the script so after the
loop you detect timeout and fail fast: after the while loop that polls
"http://localhost:${PORT}/health" (the block using DEADLINE and curl -fs
--max-time 2) check whether the last curl succeeded (or whether $(date +%s) is
>= DEADLINE) and if it timed out print a clear error (including the port) and
exit 1; apply the same change to the second identical polling block around lines
74-80 so CI job aborts immediately when readiness never succeeds.
In `@miot-harness/evals/deploy/08-tag-discipline.sh`:
- Around line 45-55: In the push case when BRANCH matches v* the EXPECTED_KEYS
array is too lax (only "sha-") so release-tag regressions can slip; update the
branch-v* branch handler to require the full-version and major.minor release
tags in addition to the sha tag by adding the corresponding expected key
patterns to EXPECTED_KEYS (refer to the push) case, the BRANCH variable check,
and the EXPECTED_KEYS symbol) and keep EXPECT_DOCKERHUB="yes".
In `@miot-harness/src/miot_harness/agents/synthesizer.py`:
- Around line 74-78: The user-facing string mixes Spanish with the raw English
`reason` (matched by `_SNAPSHOT_STALE_PREFIX`), so change the branch that
handles `if reason.startswith(_SNAPSHOT_STALE_PREFIX):` to produce a fully
Spanish message: parse out any age or numeric details from `reason` (e.g.,
extract the minutes) and format a Spanish sentence like "No puedo responder
ahora mismo: el snapshot tiene X minutos; vuelve a intentarlo cuando esté fresco
o contacta a operaciones." — keep the original `reason` logged for operators
(separate log call) rather than shown to end users so tests such as the one
referenced in test_synthesizer.py continue to reflect localized output.
In `@miot-harness/tests/api/test_health.py`:
- Around line 14-17: The fixture _clear_settings_cache currently only unsets
MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT; also ensure it unsets MIOT_HARNESS_NEXO_DSN
via monkeypatch.delenv("MIOT_HARNESS_NEXO_DSN", raising=False) so the
default-disabled assumption holds for tests, keeping the existing call to
get_settings.cache_clear() intact; update the fixture body where
monkeypatch.delenv is used to remove both env vars.
---
Outside diff comments:
In `@miot-harness/src/miot_harness/api/server.py`:
- Around line 65-94: When building the Nexo models/graph fails inside the try
block, clear any state that was set as-if Nexo succeeded: set
app.state.nexo_enabled = False, set harness.nexo_graph = None, unset/clear
harness.nexo_registered (replace result.registered) and release/clear the DB
connection by closing and/or setting app.state.nexo_pool = None (use
pool.close()/await pool.close() if available). Do this inside the except that
catches Exception (the same block that currently sets app.state.nexo_enabled =
False) so the internal state (app.state.nexo_pool, harness.nexo_registered,
harness.nexo_graph) remains consistent when build_nexo_graph / get_chat_model
fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: df852f38-466c-4f52-87ad-d4aa04bbaaec
📒 Files selected for processing (27)
.github/workflows/harness.yamlmiot-harness/.dockerignoremiot-harness/.env.examplemiot-harness/Dockerfilemiot-harness/docker-compose.ymlmiot-harness/evals/deploy/01-image-builds.shmiot-harness/evals/deploy/02-image-boots.shmiot-harness/evals/deploy/03-image-runs-demo.shmiot-harness/evals/deploy/04-workflow-shape.shmiot-harness/evals/deploy/05-pulls-from-ghcr.shmiot-harness/evals/deploy/06-pulls-from-dockerhub.shmiot-harness/evals/deploy/07-attestations-present.shmiot-harness/evals/deploy/08-tag-discipline.shmiot-harness/evals/deploy/B-checklist.mdmiot-harness/evals/deploy/README.mdmiot-harness/evals/deploy/run-all.shmiot-harness/src/miot_harness/agents/synthesizer.pymiot-harness/src/miot_harness/api/server.pymiot-harness/src/miot_harness/config.pymiot-harness/src/miot_harness/integrations/nexo/boot.pymiot-harness/src/miot_harness/integrations/nexo/credentials.pymiot-harness/src/miot_harness/integrations/nexo/pool.pymiot-harness/tests/api/__init__.pymiot-harness/tests/api/test_health.pymiot-harness/tests/integrations/nexo/test_credentials.pymiot-harness/tests/integrations/nexo/test_pool.pymiot-harness/tests/test_synthesizer.py
| # Default tenant/user when a request omits them. Per project policy, real | ||
| # tenant context comes from authenticated server context, not from these. | ||
| MIOT_HARNESS_DEFAULT_TENANT_ID=demo-tenant | ||
| MIOT_HARNESS_DEFAULT_USER_ID=demo-user | ||
|
|
||
| # ----------------------------------------------------------------------------- | ||
| # Nexo data integration (Coordinador / Mintral) | ||
| # ----------------------------------------------------------------------------- | ||
| # Path to a local clone of the db-scripts repo. The harness reads | ||
| # `<root>/databases/<alias>/.env` for PG credentials at lifespan boot. | ||
| # Leave commented to disable the Nexo integration (harness still serves | ||
| # non-Nexo runs with mocked tools). | ||
| # MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT=/path/to/db-scripts | ||
|
|
||
| # Which alias under `db-scripts/databases/` to load. The harness expects a | ||
| # read-only `harness` PG role to exist in the target DB (seeded by | ||
| # `db-scripts/scripts/seed/create-harness-reader-role.sql`). | ||
| MIOT_HARNESS_NEXO_DB_ALIAS=coordinador-dev | ||
|
|
||
| # Hard tenant lock applied to every coordinador_* tool call. Must match | ||
| # MIOT_HARNESS_DEFAULT_TENANT_ID for end-to-end runs. | ||
| MIOT_HARNESS_NEXO_TENANT_LOCK=mintral | ||
|
|
There was a problem hiding this comment.
Default tenant values conflict with the stated lock requirement.
The template says the Nexo tenant lock must match the default tenant for end-to-end runs, but the defaults differ. This is easy to trip over during first-time setup.
🔧 Suggested fix
-MIOT_HARNESS_DEFAULT_TENANT_ID=demo-tenant
+MIOT_HARNESS_DEFAULT_TENANT_ID=mintralor set MIOT_HARNESS_NEXO_TENANT_LOCK=demo-tenant to keep the current default tenant.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Default tenant/user when a request omits them. Per project policy, real | |
| # tenant context comes from authenticated server context, not from these. | |
| MIOT_HARNESS_DEFAULT_TENANT_ID=demo-tenant | |
| MIOT_HARNESS_DEFAULT_USER_ID=demo-user | |
| # ----------------------------------------------------------------------------- | |
| # Nexo data integration (Coordinador / Mintral) | |
| # ----------------------------------------------------------------------------- | |
| # Path to a local clone of the db-scripts repo. The harness reads | |
| # `<root>/databases/<alias>/.env` for PG credentials at lifespan boot. | |
| # Leave commented to disable the Nexo integration (harness still serves | |
| # non-Nexo runs with mocked tools). | |
| # MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT=/path/to/db-scripts | |
| # Which alias under `db-scripts/databases/` to load. The harness expects a | |
| # read-only `harness` PG role to exist in the target DB (seeded by | |
| # `db-scripts/scripts/seed/create-harness-reader-role.sql`). | |
| MIOT_HARNESS_NEXO_DB_ALIAS=coordinador-dev | |
| # Hard tenant lock applied to every coordinador_* tool call. Must match | |
| # MIOT_HARNESS_DEFAULT_TENANT_ID for end-to-end runs. | |
| MIOT_HARNESS_NEXO_TENANT_LOCK=mintral | |
| # Default tenant/user when a request omits them. Per project policy, real | |
| # tenant context comes from authenticated server context, not from these. | |
| MIOT_HARNESS_DEFAULT_TENANT_ID=mintral | |
| MIOT_HARNESS_DEFAULT_USER_ID=demo-user | |
| # ----------------------------------------------------------------------------- | |
| # Nexo data integration (Coordinador / Mintral) | |
| # ----------------------------------------------------------------------------- | |
| # Path to a local clone of the db-scripts repo. The harness reads | |
| # `<root>/databases/<alias>/.env` for PG credentials at lifespan boot. | |
| # Leave commented to disable the Nexo integration (harness still serves | |
| # non-Nexo runs with mocked tools). | |
| # MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT=/path/to/db-scripts | |
| # Which alias under `db-scripts/databases/` to load. The harness expects a | |
| # read-only `harness` PG role to exist in the target DB (seeded by | |
| # `db-scripts/scripts/seed/create-harness-reader-role.sql`). | |
| MIOT_HARNESS_NEXO_DB_ALIAS=coordinador-dev | |
| # Hard tenant lock applied to every coordinador_* tool call. Must match | |
| # MIOT_HARNESS_DEFAULT_TENANT_ID for end-to-end runs. | |
| MIOT_HARNESS_NEXO_TENANT_LOCK=mintral |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@miot-harness/.env.example` around lines 34 - 56, The default tenant and the
hard tenant lock conflict: update the env template so
MIOT_HARNESS_NEXO_TENANT_LOCK matches MIOT_HARNESS_DEFAULT_TENANT_ID (e.g., set
MIOT_HARNESS_NEXO_TENANT_LOCK=demo-tenant) or, alternatively, change
MIOT_HARNESS_DEFAULT_TENANT_ID to mintral and update the explanatory comment
accordingly so MIOT_HARNESS_DEFAULT_TENANT_ID and MIOT_HARNESS_NEXO_TENANT_LOCK
are consistent.
| push) | ||
| if [[ "$BRANCH" == "trunk" || "$BRANCH" == "main" ]]; then | ||
| EXPECTED_KEYS=("latest" "sha-") | ||
| EXPECT_DOCKERHUB="yes" | ||
| elif [[ "$BRANCH" == v* ]]; then | ||
| EXPECTED_KEYS=("sha-") # plus full + major.minor — best-effort | ||
| EXPECT_DOCKERHUB="yes" | ||
| else | ||
| EXPECTED_KEYS=("sha-") | ||
| EXPECT_DOCKERHUB="no" | ||
| fi |
There was a problem hiding this comment.
Release-tag runs can pass without any release tags.
For push on v*, this only requires sha-, so a regression that drops <full version> or <major>.<minor> tags still passes. That leaves the release-tag policy effectively unchecked in the automated path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@miot-harness/evals/deploy/08-tag-discipline.sh` around lines 45 - 55, In the
push case when BRANCH matches v* the EXPECTED_KEYS array is too lax (only
"sha-") so release-tag regressions can slip; update the branch-v* branch handler
to require the full-version and major.minor release tags in addition to the sha
tag by adding the corresponding expected key patterns to EXPECTED_KEYS (refer to
the push) case, the BRANCH variable check, and the EXPECTED_KEYS symbol) and
keep EXPECT_DOCKERHUB="yes".
f920ea1 to
76d18d4
Compare
Per post-mortem on the deploy-eval scope: about half the scripts were redundant with what CI naturally enforces, or asserted things that re-stated config. Trimming before merge so the suite reflects genuine load-bearing checks instead of paranoia. Removed: - 04-workflow-shape.sh — asserts the GHA workflow's job-set shape via `gh run view --json jobs`. The PR check UI already surfaces missing/failed jobs; a self-asserting script is paranoid and brittle (broke whenever job names were renamed). - 08-tag-discipline.sh — asserts the tag pattern published by a run via run-log grep. The metadata-action config IS the tag spec; a script re-asserting it duplicates YAML in another language. Brittle (run-log substring parsing) and rarely catches anything. Updated: - run-all.sh: dropped the `--with-distribution` orchestration verbiage; orchestrator runs Category A only (01/02/03), Category C scripts (05/06/07) are invoked by CI's distribution-evals job with real digest args. - harness.yaml: distribution-evals job dropped the `Tag discipline` step; now runs `Pull from GHCR` → `Verify attestations` → `Pull from Docker Hub` (non-PR only). RUN_ID env var no longer needed. - README.md: unified table across Categories A and C; added a "what these evals do NOT cover" section explaining the dropped checks. - B-checklist.md: replaced "verified by 08-tag-discipline" with a direct spot-check acknowledging metadata-action IS the spec. Added attestation negative-control reminder. What stays (each earns its cost): - 01: build + compressed-size budget. - 02: container actually works at runtime — load-bearing. - 03: optional, gated on API key — proves wires connect. - 05/06: catches "push exited 0 but manifest didn't land" + Docker Hub PR-skip-guard regressions. - 07: most valuable — negative-control for the supply-chain story. Net: 9 → 7 scripts, ~150 lines removed. Verification: - bash miot-harness/evals/deploy/run-all.sh → 3 PASS lines (01 PASS, 02 PASS, 03 PASS-SKIPPED), unchanged. - YAML parses; distribution-evals job 5 → 4 steps. - Lint/types/tests unaffected.
76d18d4 to
ac960bd
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@miot-harness/evals/deploy/run-all.sh`:
- Around line 38-47: The current pipeline uses bash "$HERE/$script" | tee | head
which can yield a non-zero pipeline exit (SIGPIPE) and wrongly run the else
branch; instead run the script and capture its full stdout+stderr into a
variable, record the script's real exit status via PIPESTATUS[0] (or
${PIPESTATUS[0]} immediately after the pipeline), then extract FIRST as the
first line from that captured output and use the recorded exit status to decide
the PASS/FAIL/SKIP logic (update places referencing FIRST, bash "$HERE/$script",
tee, head, and use PIPESTATUS to set FAIL/PASS/ SUMMARY accordingly).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d10267e9-ac51-4a86-833d-3458a1d804b2
📒 Files selected for processing (4)
.github/workflows/harness.yamlmiot-harness/evals/deploy/B-checklist.mdmiot-harness/evals/deploy/README.mdmiot-harness/evals/deploy/run-all.sh
✅ Files skipped from review due to trivial changes (1)
- miot-harness/evals/deploy/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/harness.yaml
| if FIRST=$(bash "$HERE/$script" 2>&1 | tee /dev/stderr | head -n 1); then | ||
| case "$FIRST" in | ||
| "PASS"*"SKIPPED"*) SKIP=$((SKIP+1)); SUMMARY+=("⏭ $FIRST") ;; | ||
| "PASS"*) PASS=$((PASS+1)); SUMMARY+=("✅ $FIRST") ;; | ||
| *) FAIL=$((FAIL+1)); SUMMARY+=("❓ $FIRST") ;; | ||
| esac | ||
| else | ||
| FAIL=$((FAIL+1)) | ||
| SUMMARY+=("❌ $script (exit non-zero)") | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
# Repro: pipeline can fail despite successful producer due to head/pipe close.
if FIRST=$(bash -lc 'printf "PASS 01\nextra\n"; exit 0' 2>&1 | tee /dev/stderr | head -n 1); then
echo "IF branch taken: $FIRST"
else
echo "ELSE branch taken (unexpected for a passing producer)"
fiRepository: microboxlabs/modulariot
Length of output: 181
🏁 Script executed:
# Check if the file exists and inspect it
fd -type f -name "run-all.sh" | head -20Repository: microboxlabs/modulariot
Length of output: 238
🏁 Script executed:
# Correct syntax to find the file
fd "run-all.sh"Repository: microboxlabs/modulariot
Length of output: 104
🏁 Script executed:
# Read the file to inspect it
cat -n miot-harness/evals/deploy/run-all.shRepository: microboxlabs/modulariot
Length of output: 2658
pipefail + tee|head can misclassify passing evals as failures.
On line 38, a script that prints more than one line may trip a non-zero pipeline status (SIGPIPE path) even when the eval itself passes, triggering the else branch incorrectly.
Suggested fix
- if FIRST=$(bash "$HERE/$script" 2>&1 | tee /dev/stderr | head -n 1); then
+ OUTPUT=""
+ if OUTPUT=$(bash "$HERE/$script" 2>&1); then
+ printf '%s\n' "$OUTPUT" >&2
+ FIRST=$(printf '%s\n' "$OUTPUT" | head -n 1)
case "$FIRST" in
"PASS"*"SKIPPED"*) SKIP=$((SKIP+1)); SUMMARY+=("⏭ $FIRST") ;;
"PASS"*) PASS=$((PASS+1)); SUMMARY+=("✅ $FIRST") ;;
*) FAIL=$((FAIL+1)); SUMMARY+=("❓ $FIRST") ;;
esac
else
+ printf '%s\n' "$OUTPUT" >&2
FAIL=$((FAIL+1))
SUMMARY+=("❌ $script (exit non-zero)")
fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@miot-harness/evals/deploy/run-all.sh` around lines 38 - 47, The current
pipeline uses bash "$HERE/$script" | tee | head which can yield a non-zero
pipeline exit (SIGPIPE) and wrongly run the else branch; instead run the script
and capture its full stdout+stderr into a variable, record the script's real
exit status via PIPESTATUS[0] (or ${PIPESTATUS[0]} immediately after the
pipeline), then extract FIRST as the first line from that captured output and
use the recorded exit status to decide the PASS/FAIL/SKIP logic (update places
referencing FIRST, bash "$HERE/$script", tee, head, and use PIPESTATUS to set
FAIL/PASS/ SUMMARY accordingly).
Triaged each finding against current code; fixed the still-valid ones,
skipped two with reason. All gates green (ruff, mypy 51 files, pytest
151 — local failures are .env-leak from the dogfood session, not from
these edits; verified with .env moved aside).
Fixes:
1. .github/workflows/harness.yaml — pin trivy-action by SHA
(`@c1824fd6...# v0.34.0`) instead of mutable `@master`. Matches the
pattern already used in `ci.yaml` for reproducibility / supply-chain.
2. miot-harness/.env.example — add the four T05 settings that should
have been there from the start: MIOT_HARNESS_NEXO_DSN (commented,
placeholder DSN), MIOT_HARNESS_NEXO_APPLICATION_NAME=miot-harness,
MIOT_HARNESS_LOG_LEVEL=INFO, MIOT_HARNESS_REQUEST_ID_HEADER=
x-request-id. Each with a comment matching the source-of-truth in
config.py.
4. miot-harness/docker-compose.yml — add `extra_hosts: ["host.docker.
internal:host-gateway"]` to the `harness` service. macOS/Windows
already resolve this name automatically; the directive is a no-op
there. On Linux it's required for MIOT_HARNESS_NEXO_DSN pointing at
`host.docker.internal:6434` (a host-side kubectl port-forward) to
work without `--network host`.
5. miot-harness/evals/deploy/03-image-runs-demo.sh — fail-fast on
/health timeout. Previously the polling loop just `break`d on
timeout and proceeded to `docker exec`, producing a confusing
"demo command exited 1" message. Now it sets a READY flag, exits
`1` with a clear error including the port if /health never came up.
(Reviewer flagged a "second polling block at 74-80" — that's not a
poll, it's the docker exec; only one poll exists, only one fix.)
7. miot-harness/src/miot_harness/agents/synthesizer.py — render
stale-snapshot refusal fully in Spanish. The previous code embedded
the English `reason` ("Coordinador snapshot is stale (age 4320min
> refuse threshold 240min).") inside a Spanish frame, mixing
languages for end users. Now we parse the age via regex
`\(age\s*(\d+)\s*min` and render "el snapshot tiene <X> minutos…";
fall back to a generic Spanish message if the regex doesn't match
(defends against upstream format changes). Internal reason is NOT
shown to the user; freshness_judge already logs it for operators.
Existing test passes via "snapshot" loanword.
8. miot-harness/tests/api/test_health.py — fixture also delenvs
MIOT_HARNESS_NEXO_DSN. After T05 added the DSN bypass, the lifespan
only short-circuits to "Nexo disabled" if BOTH NEXO_DSN and
NEXO_DB_SCRIPTS_ROOT are unset. Without this delenv, an operator
with NEXO_DSN in their .env would have these tests try to connect.
9. miot-harness/src/miot_harness/api/server.py — clear app.state.nexo_*
public state when the Nexo graph build fails. Previously the inner
except cleared `nexo_enabled = False` and `nexo_graph = None`, but
left `app.state.nexo_pool`, `app.state.nexo_registered`, and
`app.state.nexo_snapshot_age_minutes` populated. /health would
then report a misleading mix: enabled=False but tools=[N names].
Now all four public fields reset together. The pool itself is
still closed by the outer `finally`; we just drop the public ref.
Skipped:
3. .env.example tenant_lock vs default_tenant_id mismatch — the
current values match `config.py` defaults exactly
(default_tenant_id=demo-tenant, nexo_tenant_lock=mintral); the
comment already explains they should match for end-to-end runs.
Aligning them in the template would diverge the .env.example from
config.py defaults — strictly worse. Operators set both via their
real .env, where they MUST match anyway.
6. 08-tag-discipline.sh tightening — file deleted in commit
ac960bd during the deploy-eval scope trim (review feedback that
asserted things which re-stated config). The script no longer
exists; the review suggestion is moot.
Verification:
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q (with worktree's dogfood .env aside) → 151 passed.
- bash miot-harness/evals/deploy/run-all.sh → 3 PASS lines unchanged.
- YAML parse OK; trivy step now references the pinned SHA.
Status: ready for review. CI run 25620568674 — Lint, Image Evals, Build, Summary all ✅; Distribution + Security correctly skip on cross-fork PRs (their conditions are met by the trunk-push event after merge).
Summary
Containerizes the harness and wires it into CI per plan
13-server-deployment/(10 numbered docs + a Ralph-driven 14-task worklist; ~13 commits on this branch). After this PR lands,miot-harness:<digest>is published to GHCR (and Docker Hub for non-PR builds) on every change tomiot-harness/, with SLSA provenance + SBOM attestations, ready for whatever cluster the platform team picks.What's in the diff
Phase A — in-scope harness fixes (T01–T05)
harness(T01)—/healthnow reports{status, env, nexo: {enabled, tools, snapshot_age_minutes}}. The boot path stashessnapshot_age_minutesonapp.stateso any deploy platform's readiness probe can see the freshness gate's verdict without log scraping.harness(T02)—.envparser unwraps a single matching pair of'…'or\"…\"around values. Caught a real prod SASL auth failure earlier in development; six new cases intest_credentials.py.harness(T03)— synthesizer's failure copy no longer says "reintenta cuando el snapshot esté fresco" for non-freshness failures (filter_expert parse errors, tool errors, etc.). Routes by reason prefix; new test asserts the negative.harness(T04)—.env.examplenow mirrorsHarnessSettings1:1 across all 18 fields with comments explaining purpose, ranges, and operational gotchas.harness(T05)— added 4 deploy-readiness settings:nexo_dsn(containerized override that bypassesdb_scripts_rootfile lookup),nexo_application_name,log_level,request_id_header.create_nexo_pool()now accepts EITHERcredsORdsn(DSN wins per industry convention).Phase B — container packaging (T06–T08, T08b)
harness(T06)— multi-stage uvDockerfile(python:3.12-slimbase). Builder installs deps + project (--no-editableis critical, see commit message); runtime copies only/app/.venv. Runs as numeric UID 65534 for k8s policies. Image is 80 MB compressed (well under plan's 250 MB budget).harness(T07)—.dockerignoreexcludes.venv, caches, secrets, dev-only dirs.harness(T08)— local-onlydocker-compose.ymlwith built-in healthcheck;--profile tunneladds a placeholder pgbouncer-tunnel sidecar that documents the kubectl port-forward command.harness(T08b)— local image evals:evals/deploy/01–03plus arun-all.shorchestrator. Output convention:PASS|FAIL <id>first stdout line for grep-friendly CI logs.Phase C — CI (T09, T10a, T10b)
harness(T09)—.github/workflows/harness.yaml. Six jobs:lint-and-test→image-evals-pre-publish→publish-image→distribution-evals+security-scan→summary. Matches existing repo conventions (build-push-action@v6withprovenance: true, sbom: true,metadata-action@v5tagging, trivy SARIF, GHA cache).harness(T10a)— distribution evals:evals/deploy/04–08(workflow-shape, GHCR pull, Docker Hub pull, attestations present, tag discipline) plus aB-checklist.mdfor review-style claims.harness(T10b fix-up)— makepublish-imageanddistribution-evalsfork-PR safe viapush:andif:conditionals on(non-PR OR same-repo PR). Cross-fork PRs build but skip publish; trunk-push and same-repo PRs run the full pipeline. Caught at iteration 13 when the cross-fork PR hit "denied: installation not allowed to Create organization package" trying to push to the org's GHCR.Test plan — VERIFIED ON THIS PR'S CI
Lint & Test: ruff + mypy + pytest all green. Local: 151 passed, 1 skipped.Image Evals (pre-publish):run-all.shprints 3 PASS lines (01 builds, 02 boots, 03 SKIPPED no-API-key).Build & Publish Image: builds amd64 cleanly. Push correctly skipped on cross-fork PR; will push to GHCR on trunk merge.Build Summary: rendered cleanly withskipped (fork PR)annotations for distribution + scan.Out of scope (deferred, tracked in plan
08-followups.md)image-evals-pre-publishandpublish-image(currently two builds happen on cold runs).linux/arm64images (only Quarkus does multi-arch in this repo).miotCLI as a thin client (separate plan14-miot-cli, not yet authored).Closes / relates
Depends on PR #445 (harness scaffold, merged) and PR #446 (lint cleanup, merged). No tracking issue — happy to open one and add
Closes #Nif a reviewer wants one.Summary by CodeRabbit
Release Notes
New Features
/healthendpoint with diagnostic information on database integration status and snapshot freshness.Bug Fixes
Chores