harness(nexo): Coordinador data access — Phases A–F (closes #444)#445
Merged
Conversation
Switch the harness from venv + pip to uv: move dev deps to PEP 735 [dependency-groups], commit uv.lock for reproducible installs, and update both READMEs to use `uv sync` / `uv run`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds langchain-anthropic, langchain-openai, asyncpg to runtime deps; adds pytest-postgresql to dev deps. Required for plan 12 (multi-model chat factory + asyncpg pool against Coordinador prod). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends HarnessSettings with the MIOT_HARNESS_NEXO_* surface from plan 12: db alias, tenant lock, freshness thresholds, turn cap, critic flag, per-agent model assignment, and unprefixed provider keys (ANTHROPIC_API_KEY, OPENAI_API_KEY) via AliasChoices. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds plan.created, agent.turn, tool.failed, freshness.warning, answer.completed to HarnessEventType. Adds seq:int=0 to HarnessEvent (per review item N14) so producers can order emitted events without breaking existing call sites that don't pass seq yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defines the LangGraph state contract for the Nexo conversational graph. NexoPlan caps steps at 4 (review item N13). NexoState evidence list uses Annotated[list[NexoEvidence], operator.add] reducer (review item C3) so LangGraph appends instead of replacing. TypedDict marked total=False so partial node returns are valid. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
get_chat_model(name) dispatches on model name prefix: - claude-* → langchain_anthropic.ChatAnthropic - gpt-*/o1-*/o3-* → langchain_openai.ChatOpenAI - else ValueError Reads provider keys from HarnessSettings (which already accepts the plain ANTHROPIC_API_KEY / OPENAI_API_KEY via AliasChoices). Raises clearly when the required key is missing rather than constructing a model that explodes on first call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reads db-scripts/databases/<alias>/.env (flat KEY=VALUE), validates PGHOST/PGDATABASE/PGUSER/PGPASSWORD presence, and exposes a URL- encoded asyncpg DSN. Comments and blanks are tolerated; missing port defaults to 5432. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_nexo_pool(creds) builds an asyncpg.Pool with read-only server_settings (default_transaction_read_only=on, statement_timeout 30s, idle_in_transaction_session_timeout 5s). Applied at the connection layer so we get safety even when the role itself can't be ALTERed (CosmoDB-managed harness role). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
parse_pg_description handles three formats: @meta YAML-ish blocks, L1/L2/L3/VT prefix, plain text. is_denied implements the multi-layer denylist (name pattern fn_refresh_*, explicit fn_dx_orchestrator{,_auto}, @meta side_effects != none). introspect_nexo_functions runs the pg_proc query against an asyncpg conn/pool, applies the denylist, parses arg signatures and TABLE/json/scalar return types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_nexo_tool(descriptor, pool, tenant_lock) produces a fully-typed HarnessTool per FunctionDescriptor: dynamic input/output Pydantic models, tenant-locked check_permission, and a call that opens a pooled connection, sets search_path, runs the function, extracts refreshed_at_*, applies S9 truncation (5-row cap on table outputs) and emits tool.completed with structured metadata (source, layer, domain, refreshed_at, truncated). Description rendering preserves the routing hints the Filter Expert will need: @meta blocks render Domain/Returns; L1/L2/L3/VT prefixes render the layer tag; plain text falls through. Every tool gets the SHARED_FILTER_PRIMER appended so the LLM sees a stable contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Static module-level COORDINADOR_PRIMER covering the topics enumerated in doc 10 lines 165-185: service/proc_inst/POD/ETA semantics, the shared 18-param filter contract, fecha_tipo enum, eta_clasificacion buckets, es_critico flag, freshness rule (15 min cite), single-tenant Mintral lock. Kept as a string constant (not a function) so Anthropic prompt caching can mark it cache_control: ephemeral once per process and re-use the same bytes across runs (review item S8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ster) load_nexo_tools(registry, settings, pool) drives the four-step boot: 1. ACL check — embedded fn_refresh_direct_leak query (must be 0). 2. Connectivity + freshness probe on fn_dx_centro_control. 3. pg_proc introspection (denylist applied inside). 4. Build coordinador_* HarnessTool per descriptor; register. On any failure (no pool, ACL leak, stale snapshot, query exception, tool build error) → log critical, return NexoBootResult(enabled=False) without raising. The harness keeps booting; only Nexo path degrades (review item C4 — uniform fail-disabled posture). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_app() now installs an async lifespan that, when MIOT_HARNESS_NEXO_DB_SCRIPTS_ROOT is set, loads credentials, builds the asyncpg pool, and runs load_nexo_tools(). Result is exposed on app.state (nexo_enabled, nexo_pool, nexo_registered). All boot errors (missing env, tunnel down, ACL leak, stale snapshot, query exceptions) degrade to "Nexo disabled" without breaking startup. Pool is awaited-closed on shutdown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…schema guard) Phase B post-review fixes per the code-reviewer subagent: - Critical: collapse double tool.completed event. Previously both HarnessTool.invoke and the Nexo tool's call() emitted tool.completed, with the second event stripping all metadata. Now invoke lifts a fixed metadata set (source/refreshed_at/layer/domain/truncated) out of the typed output_model into a single event; tool_factory.call no longer emits its own. Generic tools that don't expose those fields just get the bare event. - Important: wrap "SET LOCAL search_path + query" in a read-only transaction (asyncpg auto-commits each statement otherwise, making SET LOCAL a no-op). Fixed in both tool_factory.call and the centro_control freshness probe in boot. - Important: validate the operator-supplied search_path env var against [a-z_][a-z0-9_]* before interpolating into SQL. Refuse boot otherwise. Belt-and-suspenders next to the existing fully- qualified table references. Tests updated: tool_factory test now exercises tool.invoke (the public contract) and verifies a single tool.completed with full metadata. Added truncation regression test (12 rows → 5 + total + truncated). Added schema-name guard test in boot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
next_agent(state, settings) is a deterministic state-machine router: nodes set state['next_action'] to hint where to go next; supervisor applies invariants on top (turn cap → force synth; failure → synth; answer → END; >10 messages → summarizer). Keeps v1 cost predictable without an LLM in the supervisor seat. nexo_supervisor_mode='llm' is left as a future swap-in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
filter_expert_node(state, registry, model) emits the next NexoStep each turn: builds a coordinador_*-only tool catalog, prompts the model to pick one tool with rationale and args, parses the JSON response, validates the tool exists, and appends the step to the running plan. Failure modes (malformed JSON, hallucinated tool name) are caught locally and surfaced via state["failure"] so the supervisor can route to the synthesizer for a graceful refusal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
data_fetcher_node reads the next pending NexoStep, invokes via the ToolRegistry (so the existing tool.started/completed/failed events flow through), wraps the typed output as NexoEvidence, advances pending_step_index, and signals the supervisor to route to freshness_judge. Permission denials and tool exceptions surface as state['failure'] + a tool.failed event without raising. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
freshness_judge_node classifies the latest evidence by age: fresh / warn / refuse based on nexo_freshness_warn_minutes and _refuse_minutes. Emits freshness.warning in warn/refuse zones. Refuse → state['failure'] + route to synthesizer for graceful refusal. Refactor: data_fetcher now sets NexoEvidence.is_stale at construction time using settings.nexo_freshness_warn_minutes. The judge classifies and emits but does NOT touch evidence — NexoState's evidence is Annotated[..., operator.add], so any "evidence" key in a node's returned delta would APPEND, doubling the list. Caught while writing the warn-zone test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
domain_analyst_node reads collected evidence + COORDINADOR_PRIMER and returns a JSON verdict (ready / need_more). It does NOT write the user-facing answer — that's the synthesizer. This keeps cost down (one short LLM call per turn) and keeps supervisor routing deterministic. Empty evidence short-circuits to need_more_tools without an LLM call. Malformed model output defaults to ready_to_synthesize so the synthesizer can render with what's available rather than looping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
synthesizer_node produces the final user-facing answer in three modes:
1. Failure mode (no LLM): render a graceful refusal locally when
state['failure'] is set (stale snapshot, ACL refusal, tool error).
2. Tenant refusal (no LLM): canonical "Coordinador is Mintral-only"
line when ctx.tenant_id != 'mintral' AND no evidence collected
(defense-in-depth — tenant_gate_node should have caught earlier).
3. Normal: primer + evidence → model prose, citing refreshed_at and
flagging is_stale.
Always emits answer.completed with the final text length.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
critic_node returns {} (passthrough) when settings.nexo_critic_enabled
is False. When enabled, asks the model to verify the answer cites
refreshed_at, flags stale data, and refuses non-Mintral. Concerns
surface into state['critic_concerns'] in v1 — looping the synth is
a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
summarizer_node compresses state['messages'] to [summary, last_user] when the transcript exceeds 10 messages. Threshold matches the supervisor's auto-trigger. Defensive in v1 (the staged Option C graph rarely accumulates chat turns); pays off for multi-turn follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raph
build_nexo_graph(registry, settings, models) compiles the full
conversational graph: tenant_gate → filter_expert → data_fetcher →
freshness_judge → domain_analyst → (loop or) synthesizer → critic
→ END. Routing comes from the rule-based supervisor (next_agent);
synthesizer always flows through critic so the seat stays wired
even when nexo_critic_enabled=False.
End-to-end test runs three scenarios with FakeListChatModel:
- Mintral happy path (filter_expert → fetch → fresh → analyst
ready → synth) returns the analyst's prose with the planted
refreshed_at evidence.
- Non-Mintral tenant short-circuits at tenant_gate without
invoking any LLM (verified by passing empty FakeListChatModels).
- 7-day-stale refreshed_at routes through freshness_judge →
state['failure'] → synthesizer's local failure branch (no LLM).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase C post-review fixes per the code-reviewer subagent: - Critical: filter_expert now ALWAYS clears state['next_action'] before returning. Previously, when the analyst signaled need_more_tools, the supervisor routed to filter_expert; filter_ expert appended a step but didn't clear next_action; supervisor saw need_more_tools again and re-entered the same node — a tight loop only stopped by the turn cap. Regression test added. - Critical: catch ValidationError on NexoPlan(steps=...) when the N13 max_length=4 cap is hit. Previously this propagated and crashed the graph. Now it fails soft → state['failure'] + next_action='ready_to_synthesize' so the synthesizer renders with current evidence. - Critical: strip ```json``` markdown fences before json.loads in both filter_expert and domain_analyst. Real Claude commonly wraps structured output in fences; without this, both nodes default to failure / ready respectively, breaking real-LLM use. - Important: filter_expert rejects tools that don't start with 'coordinador_'. Without this, the model could pick a non-Nexo tool that happens to be in the registry (delivery_metrics, workflow_events) and pass validation. - Important: synthesizer reads tenant_lock from settings instead of hardcoding 'mintral' so changing nexo_tenant_lock keeps the refusal text consistent. - Important: NexoState now declares `_events` as a reducer-typed channel (Annotated[..., operator.add]). Nodes that produced events via in-place state mutation now return them in the delta. nexo_graph wraps nodes to merge buffer events into the delta. Phase D wiring can drain a typed list instead of relying on LangGraph's superstep merge of in-place mutations. - Important: filter_expert emits plan.created on first step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds NEXO_QUERY to HarnessRoute. Match list (review item S7):
literal tokens ('coordinador', 'mintral', 'centro de control',
'cola crítica' / 'cola critica', 'dimensionamiento', 'torre de
control', 'auditoría pod' / 'auditoria pod', 'fn_dx') plus the
\bETA\b regex (uppercase, word-bounded — so 'etapa' / 'meta' do
not trigger).
Resolution rule when both Nexo and storytelling keywords appear:
Nexo wins. Misrouting a Nexo question into storytelling is more
expensive than the reverse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HarnessSupervisor now accepts an optional nexo_graph dependency.
On route == NEXO_QUERY:
- graph absent (Nexo disabled / boot failed) → graceful refusal
answer + answer.completed event, status=completed.
- graph present → ainvoke initial NexoState (user_message, ctx,
evidence=[], turn_count=0); drain final_state['_events'] into
record.events in order; record.answer ← final_state['answer'];
persist NexoPlan into record.artifacts (review item N12).
Top-level exception handler around dispatch sets status=failed +
emits run.failed without propagating. Storytelling path unchanged.
HarnessRunRecord gains an optional `answer: str | None` so the API
response carries the synth output without forcing an artifact wrap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
truncate_for_trace(payload) caps tool output for event/trace display:
- lists → first 5 rows + total_count
- dicts > 20 keys → keep refreshed_at_* + known KPI scalars
(n_eta_riesgo, es_critico, eta_clasificacion, fecha_tipo, etc.)
+ fill the budget; rest go into truncated_keys.
- nested list values inside dicts also truncated.
Full output stays in the run record; only the trace summary is
capped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spins up an ephemeral Postgres via pytest-postgresql and seeds a nexo-shaped fixture covering all four denylist outcomes: - allowed: clean L1 + clean @meta side_effects:none - denied by name pattern: fn_refresh_* - denied by explicit name: fn_dx_orchestrator{,_auto} - denied by @meta side_effects: writes_to_* Runs introspect_nexo_functions against asyncpg connected to the ephemeral DB; asserts surviving names + parsed layer/meta hints. Auto-skips when no postgres server binary is reachable (Homebrew's libpq-only install ships pg_ctl/initdb but no postgres server, so shutil.which alone would lie). Uses pg_config --bindir as fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… script)
F1 — miot_harness/evals/run_golden.py:
Loads evals/golden/nexo/examples.yaml, validates schema (each
entry has all required fields, no duplicate ids, adversarials
must have expected_refusal=true).
Three modes:
static — schema-only, no graph runs.
fake — FakeListChatModel scripted to pick expected_tools[0];
stub registry returns canned refreshed_at + KPIs;
scores tool_selection / filter_sanity / freshness_
citation / refusal / no_hallucination / latency_ms.
real — placeholder for live Anthropic + Nexo DB; not yet
implemented (Phase G hooks).
Writes JSON to evals/results/<short-sha>.json.
F2 — evals/judge_prompt.md:
Tier-3 advisory LLM-as-judge prompt scoring three axes 1–5
(operational_helpfulness, domain_fidelity, conciseness) with
rubric and strict JSON output format.
F3 — pyproject [project.scripts] miot-harness-evals = entry point.
Static + fake mode both run cleanly out of the box; static
validates 25 entries, fake runs all 25 through the graph in ~5s
on the dev box.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After load_nexo_tools enables Nexo, the lifespan now builds chat models per-agent (filter_expert/analyst/synthesizer/critic/summarizer) via get_chat_model() reading settings.nexo_*_model, and constructs the LangGraph via build_nexo_graph(). The compiled graph is assigned to harness.nexo_graph so HarnessSupervisor.run dispatches NEXO_QUERY through it. If chat-model construction fails (e.g., missing ANTHROPIC_API_KEY), log critical and fall back to Nexo disabled — startup still succeeds. Lifespan test now sets ANTHROPIC_API_KEY to verify the graph wiring path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coordinador prod fronts Citus with prod-mintral-pgbouncer in
transaction-pooling mode. PgBouncer's `track_extra_parameters`
allowlist does NOT include `default_transaction_read_only`,
`statement_timeout`, or `idle_in_transaction_session_timeout`,
so passing them via asyncpg `server_settings={...}` fails the
StartupMessage with:
unsupported startup parameter: default_transaction_read_only
This caused `Nexo: lifespan boot failed` against the live tunnel
even though every test passed (the unit tests mocked asyncpg).
Fix:
- Drop `server_settings` from create_nexo_pool. NEXO_SERVER_SETTINGS
becomes {} (kept exported so any future PgBouncer-allowlisted
parameters can be added in one place).
- Wrap the boot ACL check and the pg_proc introspection step in
`conn.transaction(readonly=True)` so read-only enforcement is
BEGIN READ ONLY at each transaction — supported by PgBouncer +
Postgres natively. The centro_control probe and tool_factory.call
already do this.
- Tests updated: assert empty constant, assert server_settings is
NOT forwarded to asyncpg.create_pool (regression guard).
Verified: uvicorn + tunnel boot now reaches the next step instead
of dying on StartupMessage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (86)
📝 WalkthroughWalkthroughAdds miot-harness workspace with schemas, settings, runtime, tools/storytelling, agents and LangGraph flow, Nexo asyncpg integration, API/CLI, evals, docs/examples, and extensive tests. ChangesMIOT Harness and Coordinador (Nexo) integration
Sequence Diagram(s)sequenceDiagram
participant Client as Client (rgba(66, 133, 244, 0.5))
participant API as FastAPI (rgba(52, 168, 83, 0.5))
participant Harness as Harness Supervisor (rgba(251, 188, 5, 0.5))
participant Graph as LangGraph Agents (rgba(244, 180, 0, 0.5))
participant Nexo as Postgres (rgba(234, 67, 53, 0.5))
participant LLM as Chat Model (rgba(171, 71, 188, 0.5))
Client->>API: POST /runs (UserRequest)
API->>Harness: run()
Harness->>Graph: ainvoke(initial_state)
Graph->>Nexo: SELECT nexo.fn_dx_* (READ ONLY)
Nexo-->>Graph: rows + refreshed_at
Graph->>LLM: analyze/synthesize with evidence
LLM-->>Graph: verdict/answer
Graph-->>Harness: state + events
Harness-->>API: HarnessRunRecord
API-->>Client: 200 OK + record
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
This was referenced May 9, 2026
8 tasks
This was referenced May 18, 2026
4 tasks
This was referenced Jun 6, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #444.
Summary
First real data integration for
miot-harness— the Coordinadornexoschema, via directasyncpgthrough thedb-scripts/databases/coordinador-prod-harness/tunnel and adedicated read-only
harnessPG role.Tracking issue #444 has full scope, acceptance criteria, and
follow-ups. Diff is the v3 implementation arc captured per phase
in conventional commits.
Phases shipped
db-scriptsseed for read-onlyharnessrole (validated bycheck-harness-role.sql).pg_procintrospection, denylist, primer, boot loader (ACL + freshness + register), FastAPI lifespan wiring.filter_expert,data_fetcher,freshness_judge,domain_analyst,synthesizer,critic,summarizer);StateGraphwiring.NEXO_QUERYrouting branch, supervisor integration, payload truncation utility.pytest-postgresql).server_settingsstartup params; enforce read-only viaBEGIN READ ONLYinstead).Test plan
.envwith valid Anthropic + Nexo creds, brings up tunnel.cd miot-harness && uv sync && uv run pytest -q— all green.uv run uvicorn miot_harness.api.server:create_app --factory --reloadboots withNexo: enabled — N tools registered.POST /runswith{"message":"resumen del centro de control hoy", "tenant_id":"mintral", "user_id":"harness", "thread_id":"smoke"}returns anexo_planartifact and a Spanish synthesized answer.boot.pyACL check returnsfn_refresh_direct_leak=0.MIOT_HARNESS_NEXO_FRESHNESS_REFUSE_MINUTES=0→ boot disables Nexo with critical log; restore default.Notes for reviewers
13-server-deployment, which depends on this branch being on trunk..envand.env.exampledivergence is tracked in the deploy plan's follow-ups.fn_refresh_*cron has been silent for ~2.6 days as of 2026-05-09. Worth flagging to the Coordinador team independently of this PR.Summary by CodeRabbit
Release Notes
New Features
uvwith demo execution and test suiteDocumentation