Skip to content

harness(nexo): Coordinador data access — Phases A–F (closes #444)#445

Merged
korutx merged 33 commits into
microboxlabs:trunkfrom
odtorres:codex/miot-harness-scaffold
May 9, 2026
Merged

harness(nexo): Coordinador data access — Phases A–F (closes #444)#445
korutx merged 33 commits into
microboxlabs:trunkfrom
odtorres:codex/miot-harness-scaffold

Conversation

@korutx

@korutx korutx commented May 9, 2026

Copy link
Copy Markdown
Contributor

Closes #444.

Summary

First real data integration for miot-harness — the Coordinador nexo schema, via direct asyncpg through the
db-scripts/databases/coordinador-prod-harness/ tunnel and a
dedicated read-only harness PG role.

Tracking issue #444 has full scope, acceptance criteria, and
follow-ups. Diff is the v3 implementation arc captured per phase
in conventional commits.

Phases shipped

  • Adb-scripts seed for read-only harness role (validated by check-harness-role.sql).
  • B1–B7 — credentials loader, asyncpg pool, pg_proc introspection, denylist, primer, boot loader (ACL + freshness + register), FastAPI lifespan wiring.
  • C1–C9 — rule-based supervisor + 7 LangGraph specialist agents (filter_expert, data_fetcher, freshness_judge, domain_analyst, synthesizer, critic, summarizer); StateGraph wiring.
  • D1–D3NEXO_QUERY routing branch, supervisor integration, payload truncation utility.
  • E2 — real-pg integration test for introspection (pytest-postgresql).
  • F1–F3 — eval scaffolding: golden-trace runner, LLM judge prompt, run-from-script.
  • Hotfix — PgBouncer transaction-pooling compatibility (drop server_settings startup params; enforce read-only via BEGIN READ ONLY instead).

Test plan

  • Reviewer pulls branch, populates .env with valid Anthropic + Nexo creds, brings up tunnel.
  • cd miot-harness && uv sync && uv run pytest -q — all green.
  • uv run uvicorn miot_harness.api.server:create_app --factory --reload boots with Nexo: enabled — N tools registered.
  • POST /runs with {"message":"resumen del centro de control hoy", "tenant_id":"mintral", "user_id":"harness", "thread_id":"smoke"} returns a nexo_plan artifact and a Spanish synthesized answer.
  • boot.py ACL check returns fn_refresh_direct_leak=0.
  • Freshness gate: temporarily set MIOT_HARNESS_NEXO_FRESHNESS_REFUSE_MINUTES=0 → boot disables Nexo with critical log; restore default.

Notes for reviewers

  • No prod deploy artifacts in this PR (Dockerfile, GHA workflow). Those land in a follow-up PR per plan 13-server-deployment, which depends on this branch being on trunk.
  • .env and .env.example divergence is tracked in the deploy plan's follow-ups.
  • The freshness gate currently catches a real production issue — the upstream fn_refresh_* cron has been silent for ~2.6 days as of 2026-05-09. Worth flagging to the Coordinador team independently of this PR.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added AI Harness workspace—a Python-based backend for orchestrating LLM agents with typed tools, permission checks, and conversational state management
    • Expanded monorepo to four workspaces; support for Nexo database integration enabling data-driven agent decisions
    • Storytelling module for generating delivery compliance narratives and dashboard widget recommendations
    • Local development via uv with demo execution and test suite
  • Documentation

    • Updated README to describe expanded workspace architecture
    • Added AI Harness setup and architecture guides

korutx and others added 30 commits May 4, 2026 21:29
Switch the harness from venv + pip to uv: move dev deps to PEP 735
[dependency-groups], commit uv.lock for reproducible installs, and
update both READMEs to use `uv sync` / `uv run`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds langchain-anthropic, langchain-openai, asyncpg to runtime deps;
adds pytest-postgresql to dev deps. Required for plan 12 (multi-model
chat factory + asyncpg pool against Coordinador prod).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends HarnessSettings with the MIOT_HARNESS_NEXO_* surface from
plan 12: db alias, tenant lock, freshness thresholds, turn cap,
critic flag, per-agent model assignment, and unprefixed provider
keys (ANTHROPIC_API_KEY, OPENAI_API_KEY) via AliasChoices.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds plan.created, agent.turn, tool.failed, freshness.warning,
answer.completed to HarnessEventType. Adds seq:int=0 to HarnessEvent
(per review item N14) so producers can order emitted events without
breaking existing call sites that don't pass seq yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defines the LangGraph state contract for the Nexo conversational
graph. NexoPlan caps steps at 4 (review item N13). NexoState evidence
list uses Annotated[list[NexoEvidence], operator.add] reducer
(review item C3) so LangGraph appends instead of replacing.
TypedDict marked total=False so partial node returns are valid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
get_chat_model(name) dispatches on model name prefix:
  - claude-* → langchain_anthropic.ChatAnthropic
  - gpt-*/o1-*/o3-* → langchain_openai.ChatOpenAI
  - else ValueError
Reads provider keys from HarnessSettings (which already accepts the
plain ANTHROPIC_API_KEY / OPENAI_API_KEY via AliasChoices). Raises
clearly when the required key is missing rather than constructing a
model that explodes on first call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reads db-scripts/databases/<alias>/.env (flat KEY=VALUE), validates
PGHOST/PGDATABASE/PGUSER/PGPASSWORD presence, and exposes a URL-
encoded asyncpg DSN. Comments and blanks are tolerated; missing port
defaults to 5432.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_nexo_pool(creds) builds an asyncpg.Pool with read-only
server_settings (default_transaction_read_only=on, statement_timeout
30s, idle_in_transaction_session_timeout 5s). Applied at the
connection layer so we get safety even when the role itself can't be
ALTERed (CosmoDB-managed harness role).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
parse_pg_description handles three formats: @meta YAML-ish blocks,
L1/L2/L3/VT prefix, plain text. is_denied implements the multi-layer
denylist (name pattern fn_refresh_*, explicit fn_dx_orchestrator{,_auto},
@meta side_effects != none). introspect_nexo_functions runs the
pg_proc query against an asyncpg conn/pool, applies the denylist,
parses arg signatures and TABLE/json/scalar return types.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_nexo_tool(descriptor, pool, tenant_lock) produces a fully-typed
HarnessTool per FunctionDescriptor: dynamic input/output Pydantic
models, tenant-locked check_permission, and a call that opens a
pooled connection, sets search_path, runs the function, extracts
refreshed_at_*, applies S9 truncation (5-row cap on table outputs)
and emits tool.completed with structured metadata (source, layer,
domain, refreshed_at, truncated).

Description rendering preserves the routing hints the Filter Expert
will need: @meta blocks render Domain/Returns; L1/L2/L3/VT prefixes
render the layer tag; plain text falls through. Every tool gets the
SHARED_FILTER_PRIMER appended so the LLM sees a stable contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Static module-level COORDINADOR_PRIMER covering the topics enumerated
in doc 10 lines 165-185: service/proc_inst/POD/ETA semantics, the
shared 18-param filter contract, fecha_tipo enum, eta_clasificacion
buckets, es_critico flag, freshness rule (15 min cite), single-tenant
Mintral lock.

Kept as a string constant (not a function) so Anthropic prompt caching
can mark it cache_control: ephemeral once per process and re-use the
same bytes across runs (review item S8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ster)

load_nexo_tools(registry, settings, pool) drives the four-step boot:
  1. ACL check — embedded fn_refresh_direct_leak query (must be 0).
  2. Connectivity + freshness probe on fn_dx_centro_control.
  3. pg_proc introspection (denylist applied inside).
  4. Build coordinador_* HarnessTool per descriptor; register.

On any failure (no pool, ACL leak, stale snapshot, query exception,
tool build error) → log critical, return NexoBootResult(enabled=False)
without raising. The harness keeps booting; only Nexo path degrades
(review item C4 — uniform fail-disabled posture).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_app() now installs an async lifespan that, when
MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT is set, loads credentials, builds
the asyncpg pool, and runs load_nexo_tools(). Result is exposed on
app.state (nexo_enabled, nexo_pool, nexo_registered). All boot errors
(missing env, tunnel down, ACL leak, stale snapshot, query
exceptions) degrade to "Nexo disabled" without breaking startup. Pool
is awaited-closed on shutdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…schema guard)

Phase B post-review fixes per the code-reviewer subagent:

- Critical: collapse double tool.completed event. Previously both
  HarnessTool.invoke and the Nexo tool's call() emitted tool.completed,
  with the second event stripping all metadata. Now invoke lifts a
  fixed metadata set (source/refreshed_at/layer/domain/truncated)
  out of the typed output_model into a single event; tool_factory.call
  no longer emits its own. Generic tools that don't expose those
  fields just get the bare event.

- Important: wrap "SET LOCAL search_path + query" in a read-only
  transaction (asyncpg auto-commits each statement otherwise, making
  SET LOCAL a no-op). Fixed in both tool_factory.call and the
  centro_control freshness probe in boot.

- Important: validate the operator-supplied search_path env var
  against [a-z_][a-z0-9_]* before interpolating into SQL. Refuse
  boot otherwise. Belt-and-suspenders next to the existing fully-
  qualified table references.

Tests updated: tool_factory test now exercises tool.invoke (the
public contract) and verifies a single tool.completed with full
metadata. Added truncation regression test (12 rows → 5 + total +
truncated). Added schema-name guard test in boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
next_agent(state, settings) is a deterministic state-machine router:
nodes set state['next_action'] to hint where to go next; supervisor
applies invariants on top (turn cap → force synth; failure → synth;
answer → END; >10 messages → summarizer). Keeps v1 cost predictable
without an LLM in the supervisor seat. nexo_supervisor_mode='llm' is
left as a future swap-in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
filter_expert_node(state, registry, model) emits the next NexoStep
each turn: builds a coordinador_*-only tool catalog, prompts the model
to pick one tool with rationale and args, parses the JSON response,
validates the tool exists, and appends the step to the running plan.

Failure modes (malformed JSON, hallucinated tool name) are caught
locally and surfaced via state["failure"] so the supervisor can route
to the synthesizer for a graceful refusal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
data_fetcher_node reads the next pending NexoStep, invokes via the
ToolRegistry (so the existing tool.started/completed/failed events
flow through), wraps the typed output as NexoEvidence, advances
pending_step_index, and signals the supervisor to route to
freshness_judge. Permission denials and tool exceptions surface as
state['failure'] + a tool.failed event without raising.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
freshness_judge_node classifies the latest evidence by age:
  fresh / warn / refuse based on nexo_freshness_warn_minutes and
  _refuse_minutes. Emits freshness.warning in warn/refuse zones.
  Refuse → state['failure'] + route to synthesizer for graceful refusal.

Refactor: data_fetcher now sets NexoEvidence.is_stale at construction
time using settings.nexo_freshness_warn_minutes. The judge classifies
and emits but does NOT touch evidence — NexoState's evidence is
Annotated[..., operator.add], so any "evidence" key in a node's
returned delta would APPEND, doubling the list. Caught while writing
the warn-zone test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
domain_analyst_node reads collected evidence + COORDINADOR_PRIMER and
returns a JSON verdict (ready / need_more). It does NOT write the
user-facing answer — that's the synthesizer. This keeps cost down
(one short LLM call per turn) and keeps supervisor routing
deterministic.

Empty evidence short-circuits to need_more_tools without an LLM
call. Malformed model output defaults to ready_to_synthesize so the
synthesizer can render with what's available rather than looping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
synthesizer_node produces the final user-facing answer in three modes:
  1. Failure mode (no LLM): render a graceful refusal locally when
     state['failure'] is set (stale snapshot, ACL refusal, tool error).
  2. Tenant refusal (no LLM): canonical "Coordinador is Mintral-only"
     line when ctx.tenant_id != 'mintral' AND no evidence collected
     (defense-in-depth — tenant_gate_node should have caught earlier).
  3. Normal: primer + evidence → model prose, citing refreshed_at and
     flagging is_stale.

Always emits answer.completed with the final text length.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
critic_node returns {} (passthrough) when settings.nexo_critic_enabled
is False. When enabled, asks the model to verify the answer cites
refreshed_at, flags stale data, and refuses non-Mintral. Concerns
surface into state['critic_concerns'] in v1 — looping the synth is
a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
summarizer_node compresses state['messages'] to [summary, last_user]
when the transcript exceeds 10 messages. Threshold matches the
supervisor's auto-trigger. Defensive in v1 (the staged Option C
graph rarely accumulates chat turns); pays off for multi-turn
follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raph

build_nexo_graph(registry, settings, models) compiles the full
conversational graph: tenant_gate → filter_expert → data_fetcher →
freshness_judge → domain_analyst → (loop or) synthesizer → critic
→ END. Routing comes from the rule-based supervisor (next_agent);
synthesizer always flows through critic so the seat stays wired
even when nexo_critic_enabled=False.

End-to-end test runs three scenarios with FakeListChatModel:
  - Mintral happy path (filter_expert → fetch → fresh → analyst
    ready → synth) returns the analyst's prose with the planted
    refreshed_at evidence.
  - Non-Mintral tenant short-circuits at tenant_gate without
    invoking any LLM (verified by passing empty FakeListChatModels).
  - 7-day-stale refreshed_at routes through freshness_judge →
    state['failure'] → synthesizer's local failure branch (no LLM).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase C post-review fixes per the code-reviewer subagent:

- Critical: filter_expert now ALWAYS clears state['next_action']
  before returning. Previously, when the analyst signaled
  need_more_tools, the supervisor routed to filter_expert; filter_
  expert appended a step but didn't clear next_action; supervisor
  saw need_more_tools again and re-entered the same node — a tight
  loop only stopped by the turn cap. Regression test added.

- Critical: catch ValidationError on NexoPlan(steps=...) when the
  N13 max_length=4 cap is hit. Previously this propagated and
  crashed the graph. Now it fails soft → state['failure'] +
  next_action='ready_to_synthesize' so the synthesizer renders
  with current evidence.

- Critical: strip ```json``` markdown fences before json.loads in
  both filter_expert and domain_analyst. Real Claude commonly
  wraps structured output in fences; without this, both nodes
  default to failure / ready respectively, breaking real-LLM use.

- Important: filter_expert rejects tools that don't start with
  'coordinador_'. Without this, the model could pick a non-Nexo
  tool that happens to be in the registry (delivery_metrics,
  workflow_events) and pass validation.

- Important: synthesizer reads tenant_lock from settings instead
  of hardcoding 'mintral' so changing nexo_tenant_lock keeps the
  refusal text consistent.

- Important: NexoState now declares `_events` as a reducer-typed
  channel (Annotated[..., operator.add]). Nodes that produced
  events via in-place state mutation now return them in the
  delta. nexo_graph wraps nodes to merge buffer events into the
  delta. Phase D wiring can drain a typed list instead of relying
  on LangGraph's superstep merge of in-place mutations.

- Important: filter_expert emits plan.created on first step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds NEXO_QUERY to HarnessRoute. Match list (review item S7):
literal tokens ('coordinador', 'mintral', 'centro de control',
'cola crítica' / 'cola critica', 'dimensionamiento', 'torre de
control', 'auditoría pod' / 'auditoria pod', 'fn_dx') plus the
\bETA\b regex (uppercase, word-bounded — so 'etapa' / 'meta' do
not trigger).

Resolution rule when both Nexo and storytelling keywords appear:
Nexo wins. Misrouting a Nexo question into storytelling is more
expensive than the reverse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HarnessSupervisor now accepts an optional nexo_graph dependency.
On route == NEXO_QUERY:
  - graph absent (Nexo disabled / boot failed) → graceful refusal
    answer + answer.completed event, status=completed.
  - graph present → ainvoke initial NexoState (user_message, ctx,
    evidence=[], turn_count=0); drain final_state['_events'] into
    record.events in order; record.answer ← final_state['answer'];
    persist NexoPlan into record.artifacts (review item N12).

Top-level exception handler around dispatch sets status=failed +
emits run.failed without propagating. Storytelling path unchanged.

HarnessRunRecord gains an optional `answer: str | None` so the API
response carries the synth output without forcing an artifact wrap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
truncate_for_trace(payload) caps tool output for event/trace display:
  - lists → first 5 rows + total_count
  - dicts > 20 keys → keep refreshed_at_* + known KPI scalars
    (n_eta_riesgo, es_critico, eta_clasificacion, fecha_tipo, etc.)
    + fill the budget; rest go into truncated_keys.
  - nested list values inside dicts also truncated.

Full output stays in the run record; only the trace summary is
capped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spins up an ephemeral Postgres via pytest-postgresql and seeds a
nexo-shaped fixture covering all four denylist outcomes:
  - allowed: clean L1 + clean @meta side_effects:none
  - denied by name pattern: fn_refresh_*
  - denied by explicit name: fn_dx_orchestrator{,_auto}
  - denied by @meta side_effects: writes_to_*

Runs introspect_nexo_functions against asyncpg connected to the
ephemeral DB; asserts surviving names + parsed layer/meta hints.

Auto-skips when no postgres server binary is reachable (Homebrew's
libpq-only install ships pg_ctl/initdb but no postgres server, so
shutil.which alone would lie). Uses pg_config --bindir as fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
odtorres and others added 3 commits May 8, 2026 21:57
… script)

F1 — miot_harness/evals/run_golden.py:
  Loads evals/golden/nexo/examples.yaml, validates schema (each
  entry has all required fields, no duplicate ids, adversarials
  must have expected_refusal=true).
  Three modes:
    static — schema-only, no graph runs.
    fake   — FakeListChatModel scripted to pick expected_tools[0];
             stub registry returns canned refreshed_at + KPIs;
             scores tool_selection / filter_sanity / freshness_
             citation / refusal / no_hallucination / latency_ms.
    real   — placeholder for live Anthropic + Nexo DB; not yet
             implemented (Phase G hooks).
  Writes JSON to evals/results/<short-sha>.json.

F2 — evals/judge_prompt.md:
  Tier-3 advisory LLM-as-judge prompt scoring three axes 1–5
  (operational_helpfulness, domain_fidelity, conciseness) with
  rubric and strict JSON output format.

F3 — pyproject [project.scripts] miot-harness-evals = entry point.
  Static + fake mode both run cleanly out of the box; static
  validates 25 entries, fake runs all 25 through the graph in ~5s
  on the dev box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After load_nexo_tools enables Nexo, the lifespan now builds chat
models per-agent (filter_expert/analyst/synthesizer/critic/summarizer)
via get_chat_model() reading settings.nexo_*_model, and constructs
the LangGraph via build_nexo_graph(). The compiled graph is assigned
to harness.nexo_graph so HarnessSupervisor.run dispatches NEXO_QUERY
through it.

If chat-model construction fails (e.g., missing ANTHROPIC_API_KEY),
log critical and fall back to Nexo disabled — startup still
succeeds. Lifespan test now sets ANTHROPIC_API_KEY to verify the
graph wiring path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coordinador prod fronts Citus with prod-mintral-pgbouncer in
transaction-pooling mode. PgBouncer's `track_extra_parameters`
allowlist does NOT include `default_transaction_read_only`,
`statement_timeout`, or `idle_in_transaction_session_timeout`,
so passing them via asyncpg `server_settings={...}` fails the
StartupMessage with:
  unsupported startup parameter: default_transaction_read_only

This caused `Nexo: lifespan boot failed` against the live tunnel
even though every test passed (the unit tests mocked asyncpg).

Fix:
  - Drop `server_settings` from create_nexo_pool. NEXO_SERVER_SETTINGS
    becomes {} (kept exported so any future PgBouncer-allowlisted
    parameters can be added in one place).
  - Wrap the boot ACL check and the pg_proc introspection step in
    `conn.transaction(readonly=True)` so read-only enforcement is
    BEGIN READ ONLY at each transaction — supported by PgBouncer +
    Postgres natively. The centro_control probe and tool_factory.call
    already do this.
  - Tests updated: assert empty constant, assert server_settings is
    NOT forwarded to asyncpg.create_pool (regression guard).

Verified: uvicorn + tunnel boot now reaches the next step instead
of dying on StartupMessage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 9, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 07192482-45e6-4772-96cd-d269eca98740

📥 Commits

Reviewing files that changed from the base of the PR and between 6aff153 and 742e069.

⛔ Files ignored due to path filters (1)
  • miot-harness/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (86)
  • README.md
  • miot-harness/.env.example
  • miot-harness/.gitignore
  • miot-harness/README.md
  • miot-harness/docs/architecture.md
  • miot-harness/evals/judge_prompt.md
  • miot-harness/evals/results/6a0d36fb.json
  • miot-harness/examples/delivery_compliance_request.json
  • miot-harness/pyproject.toml
  • miot-harness/src/miot_harness/__init__.py
  • miot-harness/src/miot_harness/agents/__init__.py
  • miot-harness/src/miot_harness/agents/chat_models.py
  • miot-harness/src/miot_harness/agents/critic.py
  • miot-harness/src/miot_harness/agents/data_fetcher.py
  • miot-harness/src/miot_harness/agents/deepagents_adapter.py
  • miot-harness/src/miot_harness/agents/domain_analyst.py
  • miot-harness/src/miot_harness/agents/filter_expert.py
  • miot-harness/src/miot_harness/agents/freshness_judge.py
  • miot-harness/src/miot_harness/agents/summarizer.py
  • miot-harness/src/miot_harness/agents/supervisor.py
  • miot-harness/src/miot_harness/agents/synthesizer.py
  • miot-harness/src/miot_harness/api/__init__.py
  • miot-harness/src/miot_harness/api/server.py
  • miot-harness/src/miot_harness/cli.py
  • miot-harness/src/miot_harness/config.py
  • miot-harness/src/miot_harness/evals/__init__.py
  • miot-harness/src/miot_harness/evals/run_golden.py
  • miot-harness/src/miot_harness/integrations/__init__.py
  • miot-harness/src/miot_harness/integrations/nexo/__init__.py
  • miot-harness/src/miot_harness/integrations/nexo/boot.py
  • miot-harness/src/miot_harness/integrations/nexo/credentials.py
  • miot-harness/src/miot_harness/integrations/nexo/introspect.py
  • miot-harness/src/miot_harness/integrations/nexo/pool.py
  • miot-harness/src/miot_harness/integrations/nexo/primer.py
  • miot-harness/src/miot_harness/integrations/nexo/tool_factory.py
  • miot-harness/src/miot_harness/runtime/context.py
  • miot-harness/src/miot_harness/runtime/events.py
  • miot-harness/src/miot_harness/runtime/factory.py
  • miot-harness/src/miot_harness/runtime/nexo_graph.py
  • miot-harness/src/miot_harness/runtime/permissions.py
  • miot-harness/src/miot_harness/runtime/plan.py
  • miot-harness/src/miot_harness/runtime/router.py
  • miot-harness/src/miot_harness/runtime/run_store.py
  • miot-harness/src/miot_harness/runtime/supervisor.py
  • miot-harness/src/miot_harness/runtime/tool.py
  • miot-harness/src/miot_harness/skills/delivery_compliance_story.md
  • miot-harness/src/miot_harness/skills/manifests/delivery_compliance.yaml
  • miot-harness/src/miot_harness/storytelling/contracts.py
  • miot-harness/src/miot_harness/storytelling/module.py
  • miot-harness/src/miot_harness/tools/dashboard.py
  • miot-harness/src/miot_harness/tools/delivery_metrics.py
  • miot-harness/src/miot_harness/tools/registry.py
  • miot-harness/src/miot_harness/tools/storytelling.py
  • miot-harness/src/miot_harness/tools/workflow_events.py
  • miot-harness/src/miot_harness/utils/__init__.py
  • miot-harness/src/miot_harness/utils/truncation.py
  • miot-harness/src/miot_harness/workspace/.gitkeep
  • miot-harness/src/miot_harness/workspace/README.md
  • miot-harness/tests/integrations/__init__.py
  • miot-harness/tests/integrations/nexo/__init__.py
  • miot-harness/tests/integrations/nexo/test_boot.py
  • miot-harness/tests/integrations/nexo/test_credentials.py
  • miot-harness/tests/integrations/nexo/test_introspect.py
  • miot-harness/tests/integrations/nexo/test_introspect_pg.py
  • miot-harness/tests/integrations/nexo/test_pool.py
  • miot-harness/tests/integrations/nexo/test_primer.py
  • miot-harness/tests/integrations/nexo/test_tool_factory.py
  • miot-harness/tests/test_chat_models.py
  • miot-harness/tests/test_config.py
  • miot-harness/tests/test_critic.py
  • miot-harness/tests/test_data_fetcher.py
  • miot-harness/tests/test_domain_analyst.py
  • miot-harness/tests/test_events.py
  • miot-harness/tests/test_filter_expert.py
  • miot-harness/tests/test_freshness_judge.py
  • miot-harness/tests/test_nexo_graph.py
  • miot-harness/tests/test_nexo_supervisor.py
  • miot-harness/tests/test_plan.py
  • miot-harness/tests/test_router.py
  • miot-harness/tests/test_server_lifespan.py
  • miot-harness/tests/test_storytelling_module.py
  • miot-harness/tests/test_summarizer.py
  • miot-harness/tests/test_supervisor_nexo_branch.py
  • miot-harness/tests/test_synthesizer.py
  • miot-harness/tests/test_tool_registry.py
  • miot-harness/tests/test_truncation.py

📝 Walkthrough

Walkthrough

Adds miot-harness workspace with schemas, settings, runtime, tools/storytelling, agents and LangGraph flow, Nexo asyncpg integration, API/CLI, evals, docs/examples, and extensive tests.

Changes

MIOT Harness and Coordinador (Nexo) integration

Layer / File(s) Summary
Schemas and Data Contracts
src/miot_harness/runtime/{context,events,permissions,plan}.py, src/miot_harness/storytelling/contracts.py
Defines context, events, permissions, plan/evidence/state, and storytelling models.
Runtime Core
src/miot_harness/runtime/{router,run_store,factory,supervisor,tool}.py
Routing, JSON-backed run store, harness builder, supervisor, tool lifecycle.
Tools and Storytelling
src/miot_harness/tools/*, src/miot_harness/skills/*, src/miot_harness/storytelling/module.py
Default tools, delivery compliance skill/manifest, and story creation workflow.
Agents and Supervisor
src/miot_harness/agents/*
Filter expert, data fetcher, freshness judge, domain analyst, synthesizer, critic, summarizer, chat model factory, rule-based supervisor.
Nexo Integration
src/miot_harness/integrations/nexo/*
Boot gating, credentials, pool, schema introspection, primer, and dynamic tool factory.
Graph Wiring
src/miot_harness/runtime/nexo_graph.py
StateGraph with tenant gate, supervisor routing, event buffering, synthesizer→critic→END.
API / CLI
src/miot_harness/api/server.py, src/miot_harness/cli.py
FastAPI app with lifespan and run endpoints; CLI demo command.
Evals
miot-harness/evals/*
Judge prompt and golden runner (fake mode) plus sample results JSON.
Docs / Examples / Packaging
README.md, miot-harness/README.md, .../docs/architecture.md, .../examples/*, pyproject.toml, .env.example, .gitignore, workspace/*
Documentation, example request, packaging and workspace metadata.
Tests
miot-harness/tests/**/*
Unit/integration/E2E tests for settings, agents, Nexo integration, graph, API lifespan, tools, storytelling, truncation, registry, and routing.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client (rgba(66, 133, 244, 0.5))
  participant API as FastAPI (rgba(52, 168, 83, 0.5))
  participant Harness as Harness Supervisor (rgba(251, 188, 5, 0.5))
  participant Graph as LangGraph Agents (rgba(244, 180, 0, 0.5))
  participant Nexo as Postgres (rgba(234, 67, 53, 0.5))
  participant LLM as Chat Model (rgba(171, 71, 188, 0.5))
  Client->>API: POST /runs (UserRequest)
  API->>Harness: run()
  Harness->>Graph: ainvoke(initial_state)
  Graph->>Nexo: SELECT nexo.fn_dx_* (READ ONLY)
  Nexo-->>Graph: rows + refreshed_at
  Graph->>LLM: analyze/synthesize with evidence
  LLM-->>Graph: verdict/answer
  Graph-->>Harness: state + events
  Harness-->>API: HarnessRunRecord
  API-->>Client: 200 OK + record
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

A rabbit taps the schema drum,
Tools awake, the agents hum.
Nexo’s rows, in read-only light,
Freshness judged before the write.
Stories bloom, events align—
Harness hops: “Ship time! Fine.” 🐇✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(harness): data access strategy — Coordinador (Nexo) integration

2 participants