harness(nexo): Coordinador data access — Phases A–F (closes #444) by korutx · Pull Request #445 · microboxlabs/modulariot

korutx · 2026-05-09T21:06:35Z

Closes #444.

Summary

First real data integration for miot-harness — the Coordinador nexo schema, via direct asyncpg through the
db-scripts/databases/coordinador-prod-harness/ tunnel and a
dedicated read-only harness PG role.

Tracking issue #444 has full scope, acceptance criteria, and
follow-ups. Diff is the v3 implementation arc captured per phase
in conventional commits.

Phases shipped

A — db-scripts seed for read-only harness role (validated by check-harness-role.sql).
B1–B7 — credentials loader, asyncpg pool, pg_proc introspection, denylist, primer, boot loader (ACL + freshness + register), FastAPI lifespan wiring.
C1–C9 — rule-based supervisor + 7 LangGraph specialist agents (filter_expert, data_fetcher, freshness_judge, domain_analyst, synthesizer, critic, summarizer); StateGraph wiring.
D1–D3 — NEXO_QUERY routing branch, supervisor integration, payload truncation utility.
E2 — real-pg integration test for introspection (pytest-postgresql).
F1–F3 — eval scaffolding: golden-trace runner, LLM judge prompt, run-from-script.
Hotfix — PgBouncer transaction-pooling compatibility (drop server_settings startup params; enforce read-only via BEGIN READ ONLY instead).

Test plan

Reviewer pulls branch, populates .env with valid Anthropic + Nexo creds, brings up tunnel.
cd miot-harness && uv sync && uv run pytest -q — all green.
uv run uvicorn miot_harness.api.server:create_app --factory --reload boots with Nexo: enabled — N tools registered.
POST /runs with {"message":"resumen del centro de control hoy", "tenant_id":"mintral", "user_id":"harness", "thread_id":"smoke"} returns a nexo_plan artifact and a Spanish synthesized answer.
boot.py ACL check returns fn_refresh_direct_leak=0.
Freshness gate: temporarily set MIOT_HARNESS_NEXO_FRESHNESS_REFUSE_MINUTES=0 → boot disables Nexo with critical log; restore default.

Notes for reviewers

No prod deploy artifacts in this PR (Dockerfile, GHA workflow). Those land in a follow-up PR per plan 13-server-deployment, which depends on this branch being on trunk.
.env and .env.example divergence is tracked in the deploy plan's follow-ups.
The freshness gate currently catches a real production issue — the upstream fn_refresh_* cron has been silent for ~2.6 days as of 2026-05-09. Worth flagging to the Coordinador team independently of this PR.

Summary by CodeRabbit

Release Notes

New Features
- Added AI Harness workspace—a Python-based backend for orchestrating LLM agents with typed tools, permission checks, and conversational state management
- Expanded monorepo to four workspaces; support for Nexo database integration enabling data-driven agent decisions
- Storytelling module for generating delivery compliance narratives and dashboard widget recommendations
- Local development via uv with demo execution and test suite
Documentation
- Updated README to describe expanded workspace architecture
- Added AI Harness setup and architecture guides

Switch the harness from venv + pip to uv: move dev deps to PEP 735 [dependency-groups], commit uv.lock for reproducible installs, and update both READMEs to use `uv sync` / `uv run`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds langchain-anthropic, langchain-openai, asyncpg to runtime deps; adds pytest-postgresql to dev deps. Required for plan 12 (multi-model chat factory + asyncpg pool against Coordinador prod). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends HarnessSettings with the MIOT_HARNESS_NEXO_* surface from plan 12: db alias, tenant lock, freshness thresholds, turn cap, critic flag, per-agent model assignment, and unprefixed provider keys (ANTHROPIC_API_KEY, OPENAI_API_KEY) via AliasChoices. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds plan.created, agent.turn, tool.failed, freshness.warning, answer.completed to HarnessEventType. Adds seq:int=0 to HarnessEvent (per review item N14) so producers can order emitted events without breaking existing call sites that don't pass seq yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Defines the LangGraph state contract for the Nexo conversational graph. NexoPlan caps steps at 4 (review item N13). NexoState evidence list uses Annotated[list[NexoEvidence], operator.add] reducer (review item C3) so LangGraph appends instead of replacing. TypedDict marked total=False so partial node returns are valid. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

get_chat_model(name) dispatches on model name prefix: - claude-* → langchain_anthropic.ChatAnthropic - gpt-*/o1-*/o3-* → langchain_openai.ChatOpenAI - else ValueError Reads provider keys from HarnessSettings (which already accepts the plain ANTHROPIC_API_KEY / OPENAI_API_KEY via AliasChoices). Raises clearly when the required key is missing rather than constructing a model that explodes on first call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reads db-scripts/databases/<alias>/.env (flat KEY=VALUE), validates PGHOST/PGDATABASE/PGUSER/PGPASSWORD presence, and exposes a URL- encoded asyncpg DSN. Comments and blanks are tolerated; missing port defaults to 5432. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

create_nexo_pool(creds) builds an asyncpg.Pool with read-only server_settings (default_transaction_read_only=on, statement_timeout 30s, idle_in_transaction_session_timeout 5s). Applied at the connection layer so we get safety even when the role itself can't be ALTERed (CosmoDB-managed harness role). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@meta

parse_pg_description handles three formats: @meta YAML-ish blocks, L1/L2/L3/VT prefix, plain text. is_denied implements the multi-layer denylist (name pattern fn_refresh_*, explicit fn_dx_orchestrator{,_auto}, @meta side_effects != none). introspect_nexo_functions runs the pg_proc query against an asyncpg conn/pool, applies the denylist, parses arg signatures and TABLE/json/scalar return types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@meta

build_nexo_tool(descriptor, pool, tenant_lock) produces a fully-typed HarnessTool per FunctionDescriptor: dynamic input/output Pydantic models, tenant-locked check_permission, and a call that opens a pooled connection, sets search_path, runs the function, extracts refreshed_at_*, applies S9 truncation (5-row cap on table outputs) and emits tool.completed with structured metadata (source, layer, domain, refreshed_at, truncated). Description rendering preserves the routing hints the Filter Expert will need: @meta blocks render Domain/Returns; L1/L2/L3/VT prefixes render the layer tag; plain text falls through. Every tool gets the SHARED_FILTER_PRIMER appended so the LLM sees a stable contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Static module-level COORDINADOR_PRIMER covering the topics enumerated in doc 10 lines 165-185: service/proc_inst/POD/ETA semantics, the shared 18-param filter contract, fecha_tipo enum, eta_clasificacion buckets, es_critico flag, freshness rule (15 min cite), single-tenant Mintral lock. Kept as a string constant (not a function) so Anthropic prompt caching can mark it cache_control: ephemeral once per process and re-use the same bytes across runs (review item S8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ster) load_nexo_tools(registry, settings, pool) drives the four-step boot: 1. ACL check — embedded fn_refresh_direct_leak query (must be 0). 2. Connectivity + freshness probe on fn_dx_centro_control. 3. pg_proc introspection (denylist applied inside). 4. Build coordinador_* HarnessTool per descriptor; register. On any failure (no pool, ACL leak, stale snapshot, query exception, tool build error) → log critical, return NexoBootResult(enabled=False) without raising. The harness keeps booting; only Nexo path degrades (review item C4 — uniform fail-disabled posture). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

create_app() now installs an async lifespan that, when MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT is set, loads credentials, builds the asyncpg pool, and runs load_nexo_tools(). Result is exposed on app.state (nexo_enabled, nexo_pool, nexo_registered). All boot errors (missing env, tunnel down, ACL leak, stale snapshot, query exceptions) degrade to "Nexo disabled" without breaking startup. Pool is awaited-closed on shutdown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…schema guard) Phase B post-review fixes per the code-reviewer subagent: - Critical: collapse double tool.completed event. Previously both HarnessTool.invoke and the Nexo tool's call() emitted tool.completed, with the second event stripping all metadata. Now invoke lifts a fixed metadata set (source/refreshed_at/layer/domain/truncated) out of the typed output_model into a single event; tool_factory.call no longer emits its own. Generic tools that don't expose those fields just get the bare event. - Important: wrap "SET LOCAL search_path + query" in a read-only transaction (asyncpg auto-commits each statement otherwise, making SET LOCAL a no-op). Fixed in both tool_factory.call and the centro_control freshness probe in boot. - Important: validate the operator-supplied search_path env var against [a-z_][a-z0-9_]* before interpolating into SQL. Refuse boot otherwise. Belt-and-suspenders next to the existing fully- qualified table references. Tests updated: tool_factory test now exercises tool.invoke (the public contract) and verifies a single tool.completed with full metadata. Added truncation regression test (12 rows → 5 + total + truncated). Added schema-name guard test in boot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

next_agent(state, settings) is a deterministic state-machine router: nodes set state['next_action'] to hint where to go next; supervisor applies invariants on top (turn cap → force synth; failure → synth; answer → END; >10 messages → summarizer). Keeps v1 cost predictable without an LLM in the supervisor seat. nexo_supervisor_mode='llm' is left as a future swap-in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

filter_expert_node(state, registry, model) emits the next NexoStep each turn: builds a coordinador_*-only tool catalog, prompts the model to pick one tool with rationale and args, parses the JSON response, validates the tool exists, and appends the step to the running plan. Failure modes (malformed JSON, hallucinated tool name) are caught locally and surfaced via state["failure"] so the supervisor can route to the synthesizer for a graceful refusal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

data_fetcher_node reads the next pending NexoStep, invokes via the ToolRegistry (so the existing tool.started/completed/failed events flow through), wraps the typed output as NexoEvidence, advances pending_step_index, and signals the supervisor to route to freshness_judge. Permission denials and tool exceptions surface as state['failure'] + a tool.failed event without raising. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

freshness_judge_node classifies the latest evidence by age: fresh / warn / refuse based on nexo_freshness_warn_minutes and _refuse_minutes. Emits freshness.warning in warn/refuse zones. Refuse → state['failure'] + route to synthesizer for graceful refusal. Refactor: data_fetcher now sets NexoEvidence.is_stale at construction time using settings.nexo_freshness_warn_minutes. The judge classifies and emits but does NOT touch evidence — NexoState's evidence is Annotated[..., operator.add], so any "evidence" key in a node's returned delta would APPEND, doubling the list. Caught while writing the warn-zone test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

domain_analyst_node reads collected evidence + COORDINADOR_PRIMER and returns a JSON verdict (ready / need_more). It does NOT write the user-facing answer — that's the synthesizer. This keeps cost down (one short LLM call per turn) and keeps supervisor routing deterministic. Empty evidence short-circuits to need_more_tools without an LLM call. Malformed model output defaults to ready_to_synthesize so the synthesizer can render with what's available rather than looping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

synthesizer_node produces the final user-facing answer in three modes: 1. Failure mode (no LLM): render a graceful refusal locally when state['failure'] is set (stale snapshot, ACL refusal, tool error). 2. Tenant refusal (no LLM): canonical "Coordinador is Mintral-only" line when ctx.tenant_id != 'mintral' AND no evidence collected (defense-in-depth — tenant_gate_node should have caught earlier). 3. Normal: primer + evidence → model prose, citing refreshed_at and flagging is_stale. Always emits answer.completed with the final text length. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

critic_node returns {} (passthrough) when settings.nexo_critic_enabled is False. When enabled, asks the model to verify the answer cites refreshed_at, flags stale data, and refuses non-Mintral. Concerns surface into state['critic_concerns'] in v1 — looping the synth is a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

summarizer_node compresses state['messages'] to [summary, last_user] when the transcript exceeds 10 messages. Threshold matches the supervisor's auto-trigger. Defensive in v1 (the staged Option C graph rarely accumulates chat turns); pays off for multi-turn follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…raph build_nexo_graph(registry, settings, models) compiles the full conversational graph: tenant_gate → filter_expert → data_fetcher → freshness_judge → domain_analyst → (loop or) synthesizer → critic → END. Routing comes from the rule-based supervisor (next_agent); synthesizer always flows through critic so the seat stays wired even when nexo_critic_enabled=False. End-to-end test runs three scenarios with FakeListChatModel: - Mintral happy path (filter_expert → fetch → fresh → analyst ready → synth) returns the analyst's prose with the planted refreshed_at evidence. - Non-Mintral tenant short-circuits at tenant_gate without invoking any LLM (verified by passing empty FakeListChatModels). - 7-day-stale refreshed_at routes through freshness_judge → state['failure'] → synthesizer's local failure branch (no LLM). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase C post-review fixes per the code-reviewer subagent: - Critical: filter_expert now ALWAYS clears state['next_action'] before returning. Previously, when the analyst signaled need_more_tools, the supervisor routed to filter_expert; filter_ expert appended a step but didn't clear next_action; supervisor saw need_more_tools again and re-entered the same node — a tight loop only stopped by the turn cap. Regression test added. - Critical: catch ValidationError on NexoPlan(steps=...) when the N13 max_length=4 cap is hit. Previously this propagated and crashed the graph. Now it fails soft → state['failure'] + next_action='ready_to_synthesize' so the synthesizer renders with current evidence. - Critical: strip ```json``` markdown fences before json.loads in both filter_expert and domain_analyst. Real Claude commonly wraps structured output in fences; without this, both nodes default to failure / ready respectively, breaking real-LLM use. - Important: filter_expert rejects tools that don't start with 'coordinador_'. Without this, the model could pick a non-Nexo tool that happens to be in the registry (delivery_metrics, workflow_events) and pass validation. - Important: synthesizer reads tenant_lock from settings instead of hardcoding 'mintral' so changing nexo_tenant_lock keeps the refusal text consistent. - Important: NexoState now declares `_events` as a reducer-typed channel (Annotated[..., operator.add]). Nodes that produced events via in-place state mutation now return them in the delta. nexo_graph wraps nodes to merge buffer events into the delta. Phase D wiring can drain a typed list instead of relying on LangGraph's superstep merge of in-place mutations. - Important: filter_expert emits plan.created on first step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds NEXO_QUERY to HarnessRoute. Match list (review item S7): literal tokens ('coordinador', 'mintral', 'centro de control', 'cola crítica' / 'cola critica', 'dimensionamiento', 'torre de control', 'auditoría pod' / 'auditoria pod', 'fn_dx') plus the \bETA\b regex (uppercase, word-bounded — so 'etapa' / 'meta' do not trigger). Resolution rule when both Nexo and storytelling keywords appear: Nexo wins. Misrouting a Nexo question into storytelling is more expensive than the reverse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

HarnessSupervisor now accepts an optional nexo_graph dependency. On route == NEXO_QUERY: - graph absent (Nexo disabled / boot failed) → graceful refusal answer + answer.completed event, status=completed. - graph present → ainvoke initial NexoState (user_message, ctx, evidence=[], turn_count=0); drain final_state['_events'] into record.events in order; record.answer ← final_state['answer']; persist NexoPlan into record.artifacts (review item N12). Top-level exception handler around dispatch sets status=failed + emits run.failed without propagating. Storytelling path unchanged. HarnessRunRecord gains an optional `answer: str | None` so the API response carries the synth output without forcing an artifact wrap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

truncate_for_trace(payload) caps tool output for event/trace display: - lists → first 5 rows + total_count - dicts > 20 keys → keep refreshed_at_* + known KPI scalars (n_eta_riesgo, es_critico, eta_clasificacion, fecha_tipo, etc.) + fill the budget; rest go into truncated_keys. - nested list values inside dicts also truncated. Full output stays in the run record; only the trace summary is capped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@meta

Spins up an ephemeral Postgres via pytest-postgresql and seeds a nexo-shaped fixture covering all four denylist outcomes: - allowed: clean L1 + clean @meta side_effects:none - denied by name pattern: fn_refresh_* - denied by explicit name: fn_dx_orchestrator{,_auto} - denied by @meta side_effects: writes_to_* Runs introspect_nexo_functions against asyncpg connected to the ephemeral DB; asserts surviving names + parsed layer/meta hints. Auto-skips when no postgres server binary is reachable (Homebrew's libpq-only install ships pg_ctl/initdb but no postgres server, so shutil.which alone would lie). Uses pg_config --bindir as fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… script) F1 — miot_harness/evals/run_golden.py: Loads evals/golden/nexo/examples.yaml, validates schema (each entry has all required fields, no duplicate ids, adversarials must have expected_refusal=true). Three modes: static — schema-only, no graph runs. fake — FakeListChatModel scripted to pick expected_tools[0]; stub registry returns canned refreshed_at + KPIs; scores tool_selection / filter_sanity / freshness_ citation / refusal / no_hallucination / latency_ms. real — placeholder for live Anthropic + Nexo DB; not yet implemented (Phase G hooks). Writes JSON to evals/results/<short-sha>.json. F2 — evals/judge_prompt.md: Tier-3 advisory LLM-as-judge prompt scoring three axes 1–5 (operational_helpfulness, domain_fidelity, conciseness) with rubric and strict JSON output format. F3 — pyproject [project.scripts] miot-harness-evals = entry point. Static + fake mode both run cleanly out of the box; static validates 25 entries, fake runs all 25 through the graph in ~5s on the dev box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After load_nexo_tools enables Nexo, the lifespan now builds chat models per-agent (filter_expert/analyst/synthesizer/critic/summarizer) via get_chat_model() reading settings.nexo_*_model, and constructs the LangGraph via build_nexo_graph(). The compiled graph is assigned to harness.nexo_graph so HarnessSupervisor.run dispatches NEXO_QUERY through it. If chat-model construction fails (e.g., missing ANTHROPIC_API_KEY), log critical and fall back to Nexo disabled — startup still succeeds. Lifespan test now sets ANTHROPIC_API_KEY to verify the graph wiring path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Coordinador prod fronts Citus with prod-mintral-pgbouncer in transaction-pooling mode. PgBouncer's `track_extra_parameters` allowlist does NOT include `default_transaction_read_only`, `statement_timeout`, or `idle_in_transaction_session_timeout`, so passing them via asyncpg `server_settings={...}` fails the StartupMessage with: unsupported startup parameter: default_transaction_read_only This caused `Nexo: lifespan boot failed` against the live tunnel even though every test passed (the unit tests mocked asyncpg). Fix: - Drop `server_settings` from create_nexo_pool. NEXO_SERVER_SETTINGS becomes {} (kept exported so any future PgBouncer-allowlisted parameters can be added in one place). - Wrap the boot ACL check and the pg_proc introspection step in `conn.transaction(readonly=True)` so read-only enforcement is BEGIN READ ONLY at each transaction — supported by PgBouncer + Postgres natively. The centro_control probe and tool_factory.call already do this. - Tests updated: assert empty constant, assert server_settings is NOT forwarded to asyncpg.create_pool (regression guard). Verified: uvicorn + tunnel boot now reaches the next step instead of dying on StartupMessage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-09T21:06:51Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 07192482-45e6-4772-96cd-d269eca98740

📥 Commits

Reviewing files that changed from the base of the PR and between 6aff153 and 742e069.

⛔ Files ignored due to path filters (1)

miot-harness/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (86)

README.md
miot-harness/.env.example
miot-harness/.gitignore
miot-harness/README.md
miot-harness/docs/architecture.md
miot-harness/evals/judge_prompt.md
miot-harness/evals/results/6a0d36fb.json
miot-harness/examples/delivery_compliance_request.json
miot-harness/pyproject.toml
miot-harness/src/miot_harness/__init__.py
miot-harness/src/miot_harness/agents/__init__.py
miot-harness/src/miot_harness/agents/chat_models.py
miot-harness/src/miot_harness/agents/critic.py
miot-harness/src/miot_harness/agents/data_fetcher.py
miot-harness/src/miot_harness/agents/deepagents_adapter.py
miot-harness/src/miot_harness/agents/domain_analyst.py
miot-harness/src/miot_harness/agents/filter_expert.py
miot-harness/src/miot_harness/agents/freshness_judge.py
miot-harness/src/miot_harness/agents/summarizer.py
miot-harness/src/miot_harness/agents/supervisor.py
miot-harness/src/miot_harness/agents/synthesizer.py
miot-harness/src/miot_harness/api/__init__.py
miot-harness/src/miot_harness/api/server.py
miot-harness/src/miot_harness/cli.py
miot-harness/src/miot_harness/config.py
miot-harness/src/miot_harness/evals/__init__.py
miot-harness/src/miot_harness/evals/run_golden.py
miot-harness/src/miot_harness/integrations/__init__.py
miot-harness/src/miot_harness/integrations/nexo/__init__.py
miot-harness/src/miot_harness/integrations/nexo/boot.py
miot-harness/src/miot_harness/integrations/nexo/credentials.py
miot-harness/src/miot_harness/integrations/nexo/introspect.py
miot-harness/src/miot_harness/integrations/nexo/pool.py
miot-harness/src/miot_harness/integrations/nexo/primer.py
miot-harness/src/miot_harness/integrations/nexo/tool_factory.py
miot-harness/src/miot_harness/runtime/context.py
miot-harness/src/miot_harness/runtime/events.py
miot-harness/src/miot_harness/runtime/factory.py
miot-harness/src/miot_harness/runtime/nexo_graph.py
miot-harness/src/miot_harness/runtime/permissions.py
miot-harness/src/miot_harness/runtime/plan.py
miot-harness/src/miot_harness/runtime/router.py
miot-harness/src/miot_harness/runtime/run_store.py
miot-harness/src/miot_harness/runtime/supervisor.py
miot-harness/src/miot_harness/runtime/tool.py
miot-harness/src/miot_harness/skills/delivery_compliance_story.md
miot-harness/src/miot_harness/skills/manifests/delivery_compliance.yaml
miot-harness/src/miot_harness/storytelling/contracts.py
miot-harness/src/miot_harness/storytelling/module.py
miot-harness/src/miot_harness/tools/dashboard.py
miot-harness/src/miot_harness/tools/delivery_metrics.py
miot-harness/src/miot_harness/tools/registry.py
miot-harness/src/miot_harness/tools/storytelling.py
miot-harness/src/miot_harness/tools/workflow_events.py
miot-harness/src/miot_harness/utils/__init__.py
miot-harness/src/miot_harness/utils/truncation.py
miot-harness/src/miot_harness/workspace/.gitkeep
miot-harness/src/miot_harness/workspace/README.md
miot-harness/tests/integrations/__init__.py
miot-harness/tests/integrations/nexo/__init__.py
miot-harness/tests/integrations/nexo/test_boot.py
miot-harness/tests/integrations/nexo/test_credentials.py
miot-harness/tests/integrations/nexo/test_introspect.py
miot-harness/tests/integrations/nexo/test_introspect_pg.py
miot-harness/tests/integrations/nexo/test_pool.py
miot-harness/tests/integrations/nexo/test_primer.py
miot-harness/tests/integrations/nexo/test_tool_factory.py
miot-harness/tests/test_chat_models.py
miot-harness/tests/test_config.py
miot-harness/tests/test_critic.py
miot-harness/tests/test_data_fetcher.py
miot-harness/tests/test_domain_analyst.py
miot-harness/tests/test_events.py
miot-harness/tests/test_filter_expert.py
miot-harness/tests/test_freshness_judge.py
miot-harness/tests/test_nexo_graph.py
miot-harness/tests/test_nexo_supervisor.py
miot-harness/tests/test_plan.py
miot-harness/tests/test_router.py
miot-harness/tests/test_server_lifespan.py
miot-harness/tests/test_storytelling_module.py
miot-harness/tests/test_summarizer.py
miot-harness/tests/test_supervisor_nexo_branch.py
miot-harness/tests/test_synthesizer.py
miot-harness/tests/test_tool_registry.py
miot-harness/tests/test_truncation.py

📝 Walkthrough

Walkthrough

Adds miot-harness workspace with schemas, settings, runtime, tools/storytelling, agents and LangGraph flow, Nexo asyncpg integration, API/CLI, evals, docs/examples, and extensive tests.

Changes

MIOT Harness and Coordinador (Nexo) integration

Layer / File(s)	Summary
Schemas and Data Contracts `src/miot_harness/runtime/{context,events,permissions,plan}.py`, `src/miot_harness/storytelling/contracts.py`	Defines context, events, permissions, plan/evidence/state, and storytelling models.
Runtime Core `src/miot_harness/runtime/{router,run_store,factory,supervisor,tool}.py`	Routing, JSON-backed run store, harness builder, supervisor, tool lifecycle.
Tools and Storytelling `src/miot_harness/tools/`, `src/miot_harness/skills/`, `src/miot_harness/storytelling/module.py`	Default tools, delivery compliance skill/manifest, and story creation workflow.
Agents and Supervisor `src/miot_harness/agents/*`	Filter expert, data fetcher, freshness judge, domain analyst, synthesizer, critic, summarizer, chat model factory, rule-based supervisor.
Nexo Integration `src/miot_harness/integrations/nexo/*`	Boot gating, credentials, pool, schema introspection, primer, and dynamic tool factory.
Graph Wiring `src/miot_harness/runtime/nexo_graph.py`	StateGraph with tenant gate, supervisor routing, event buffering, synthesizer→critic→END.
API / CLI `src/miot_harness/api/server.py`, `src/miot_harness/cli.py`	FastAPI app with lifespan and run endpoints; CLI demo command.
Evals `miot-harness/evals/*`	Judge prompt and golden runner (fake mode) plus sample results JSON.
Docs / Examples / Packaging `README.md`, `miot-harness/README.md`, `.../docs/architecture.md`, `.../examples/`, `pyproject.toml`, `.env.example`, `.gitignore`, `workspace/`	Documentation, example request, packaging and workspace metadata.
Tests `miot-harness/tests/*/`	Unit/integration/E2E tests for settings, agents, Nexo integration, graph, API lifespan, tools, storytelling, truncation, registry, and routing.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client (rgba(66, 133, 244, 0.5))
  participant API as FastAPI (rgba(52, 168, 83, 0.5))
  participant Harness as Harness Supervisor (rgba(251, 188, 5, 0.5))
  participant Graph as LangGraph Agents (rgba(244, 180, 0, 0.5))
  participant Nexo as Postgres (rgba(234, 67, 53, 0.5))
  participant LLM as Chat Model (rgba(171, 71, 188, 0.5))
  Client->>API: POST /runs (UserRequest)
  API->>Harness: run()
  Harness->>Graph: ainvoke(initial_state)
  Graph->>Nexo: SELECT nexo.fn_dx_* (READ ONLY)
  Nexo-->>Graph: rows + refreshed_at
  Graph->>LLM: analyze/synthesize with evidence
  LLM-->>Graph: verdict/answer
  Graph-->>Harness: state + events
  Harness-->>API: HarnessRunRecord
  API-->>Client: 200 OK + record

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

A rabbit taps the schema drum,
Tools awake, the agents hum.
Nexo’s rows, in read-only light,
Freshness judged before the write.
Stories bloom, events align—
Harness hops: “Ship time! Fine.” 🐇✨

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

korutx and others added 30 commits May 4, 2026 21:29

feat: scaffold MIOT harness backend

83c823e

Merge branch 'trunk' into codex/miot-harness-scaffold

5759333

odtorres and others added 3 commits May 8, 2026 21:57

korutx merged commit 787375d into microboxlabs:trunk May 9, 2026
1 check was pending

korutx deleted the codex/miot-harness-scaffold branch May 9, 2026 21:11

This was referenced May 9, 2026

harness(cleanup): unblock T00 pre-flight (format + py.typed + strict mypy + tests) #446

Merged

harness: deploy stack — Dockerfile, CI workflow, deploy evals (T10b verify) #447

Merged

coderabbitai Bot mentioned this pull request May 12, 2026

harness(phase-13): per-agent telemetry + agentic search foundation #462

Merged

8 tasks

odtorres mentioned this pull request May 15, 2026

feat(harness): per-tenant billing surface for AI cost attribution #482

Closed

This was referenced May 18, 2026

Feat/harness sse streaming #496

Merged

harness(nexo): convert DB config to a standard Postgres DSN #511

Merged

Feat/harness sse rich events #516

Merged

coderabbitai Bot mentioned this pull request May 28, 2026

feat(harness): close Phase 4 backend gaps (approvals, cancel, identity) #545

Merged

4 tasks

This was referenced Jun 6, 2026

refactor(harness)!: datasource-agnostic core — Nexo becomes the first DataSourceProvider #604

Merged

Based/harness improvement and fixes #641

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harness(nexo): Coordinador data access — Phases A–F (closes #444)#445

harness(nexo): Coordinador data access — Phases A–F (closes #444)#445
korutx merged 33 commits into
microboxlabs:trunkfrom
odtorres:codex/miot-harness-scaffold

korutx commented May 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

korutx commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phases shipped

Test plan

Notes for reviewers

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

korutx commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading