harness(cleanup): unblock T00 pre-flight (format + py.typed + strict mypy + tests) by korutx · Pull Request #446 · microboxlabs/modulariot

korutx · 2026-05-09T22:49:41Z

Why

Pre-existing failures on freshly-merged trunk (#445) would block any future
pre-flight gate that runs ruff / mypy / pytest -q as hard gates — for
example the upcoming RALPH-LOOP T00 step before the deploy work in plan
`13-server-deployment`.

Failing checks observed on trunk before this PR:

`ruff check src tests` → 52 errors (12 auto-fixable)
`uv run mypy` → fails at the package marker check (`Package 'miot_harness' cannot be type checked due to missing py.typed`)
`uv run pytest -q` → 1 failed (`test_get_chat_model_returns_openai_for_gpt` — `get_settings` was `@lru_cache`'d, so monkeypatch'd env vars after cache-warming had no effect)

What

Single-commit cleanup. No behavior changes. Scoped to `miot-harness/`.

`ruff format` across the package (39 files, pure whitespace).
Lint residue — manually reflowed 8 long lines, tightened 4 broad `pytest.raises(Exception)` to specific types (`RuntimeError`, `pydantic.ValidationError`), `# noqa: E402` on 2 intentionally-late conditional imports.
`py.typed` marker added so mypy actually inspects the package. This surfaced 32 latent strict-mode errors, all fixed in the same commit:
- `pyproject.toml`: `ignore_missing_imports` for `asyncpg` + `yaml`.
- `chat_models.py`: wrap api keys in `pydantic.SecretStr` for `ChatAnthropic` / `ChatOpenAI`.
- `api/server.py`: type the lifespan factory + inner async lifespan.
- `evals/run_golden.py`: annotate the `_check` / `_call` closures.
- `integrations/nexo/tool_factory.py`: widen `common` to `dict[str, Any]`, cast `create_model()` returns to `type[BaseModel]`, parameterize `HarnessTool[BaseModel, BaseModel]`, drop one stale `# type: ignore`.
- `runtime/nexo_graph.py`: every node closure now `(NexoState) -> dict[str, Any]`, `cast` to `dict` at the seam to downstream node functions; route map coerced to `dict[Hashable, str]`.
Test fixture for `get_settings.lru_cache` in `test_chat_models.py` (autouse `cache_clear` before/after each test). Removed redundant manual `cache_clear()` calls. Tightened the missing-key test to `raises(RuntimeError)`.

Verification

All on this branch:

`uv run ruff check src tests` → All checks passed.
`uv run mypy` → Success: no issues found in 51 source files.
`uv run pytest -q` → 139 passed, 1 skipped.

Out of scope

No Dockerfile, no CI workflow YAML, no deploy work — those land in follow-ups per `13-server-deployment/`.
No new dependencies. `uv.lock` is unchanged.
No public API changes.

Reviewer notes

Diff is large (49 files) but ~80% is pure `ruff format` whitespace. The semantic substance is in 15 files:

`pyproject.toml`, `src/miot_harness/py.typed`, `agents/chat_models.py`, `api/server.py`, `evals/run_golden.py`, `integrations/nexo/tool_factory.py`, `runtime/nexo_graph.py`
`tests/test_chat_models.py`, `tests/test_config.py`, `tests/test_events.py`, `tests/integrations/nexo/test_tool_factory.py`, `tests/integrations/nexo/test_introspect_pg.py`, `tests/test_filter_expert.py`
plus minor reflows in `agents/freshness_judge.py` and `integrations/nexo/boot.py`

Suggested review order: read the commit message (it's structured by the four categories above), then look at each semantic file. Skip the format-only ones unless the formatter output looks suspicious.

Summary by CodeRabbit

Release Notes

New Features
- Added CLI entrypoint with demo subcommand for testing.
- Added widget_drafts and approval_proposals fields to story artifacts.
- Added get_delivery_compliance_metrics and get_workflow_bottlenecks tools.
- Added JsonRunStore.load() method to retrieve run records.
- Added PermissionResult.deny() class method for permission handling.
Bug Fixes
- Improved API key handling with secure string wrapping.
Improvements
- Enhanced type annotations throughout the codebase.

…mypy + test fixes) Pre-existing failures on freshly-merged trunk would block any future pre-flight gate (e.g. RALPH-LOOP T00) that runs ruff/mypy/pytest as hard gates. This PR makes all three green, scoped to miot-harness/ only, no behavior changes. What changed (categorized): 1. ruff format applied across the package - 39 files reformatted; pure whitespace/wrapping, no semantic change. 2. ruff lint — manual fixes for residue after format - 8 long lines reflowed (E501) in chat_models.py, freshness_judge.py, api/server.py, evals/run_golden.py, integrations/nexo/boot.py, tests/test_filter_expert.py. - 4 over-broad pytest.raises(Exception) tightened to specific types (RuntimeError, ValidationError) in test_chat_models.py, test_config.py, test_events.py, test_tool_factory.py. - 2 conditional imports in test_introspect_pg.py marked # noqa: E402 (intentional — they sit after pytest.skip(allow_module_level=True)). 3. mypy strict mode — was silently bailing at the package marker check - Added src/miot_harness/py.typed so mypy actually inspects the package. This surfaced 32 latent strict-mode errors which are all fixed in this commit. - pyproject.toml: ignore_missing_imports for asyncpg + yaml (third-party, no stubs). - chat_models.py: wrap api_key strings in pydantic.SecretStr to satisfy ChatAnthropic/ChatOpenAI signatures. - api/server.py: typed _make_lifespan and the inner async lifespan. - evals/run_golden.py: annotated _check/_call closures with the HarnessTool callback signatures. - integrations/nexo/tool_factory.py: widened common dict to dict[str, Any] (was over-narrowed by literal-inference); cast create_model() return to type[BaseModel]; parameterised HarnessTool[BaseModel, BaseModel]; removed an unused # type: ignore. - runtime/nexo_graph.py: typed every node closure as (NexoState) -> dict[str, Any], casting to dict at the seam to downstream node functions; typed route(); coerced _ROUTE_MAP into dict[Hashable, str] for StateGraph.add_conditional_edges. 4. test fixture — settings.lru_cache vs monkeypatch - test_chat_models.py: get_settings() is @lru_cache, so monkeypatch'd env vars after the cache is warm have no effect. Added an autouse fixture that clears the cache before/after each test in this file. Removed redundant manual cache_clear() calls. Tightened the missing- ANTHROPIC_API_KEY test from raises(Exception) to raises(RuntimeError). Verification (all on this branch): - uv run ruff check src tests → All checks passed. - uv run mypy → Success: no issues found in 51 source files. - uv run pytest -q → 139 passed, 1 skipped. Out of scope: Dockerfile, CI workflow, deploy plan — those land in follow-up PRs per .cursor/plans/ai-first/13-server-deployment/.

coderabbitai · 2026-05-09T22:49:56Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fa2eb50-8d67-450a-9428-e18bae0db204

📥 Commits

Reviewing files that changed from the base of the PR and between 787375d and a8b62f1.

📒 Files selected for processing (48)

miot-harness/pyproject.toml
miot-harness/src/miot_harness/__init__.py
miot-harness/src/miot_harness/agents/__init__.py
miot-harness/src/miot_harness/agents/chat_models.py
miot-harness/src/miot_harness/agents/critic.py
miot-harness/src/miot_harness/agents/deepagents_adapter.py
miot-harness/src/miot_harness/agents/domain_analyst.py
miot-harness/src/miot_harness/agents/filter_expert.py
miot-harness/src/miot_harness/agents/freshness_judge.py
miot-harness/src/miot_harness/agents/summarizer.py
miot-harness/src/miot_harness/agents/synthesizer.py
miot-harness/src/miot_harness/api/__init__.py
miot-harness/src/miot_harness/api/server.py
miot-harness/src/miot_harness/cli.py
miot-harness/src/miot_harness/evals/run_golden.py
miot-harness/src/miot_harness/integrations/nexo/boot.py
miot-harness/src/miot_harness/integrations/nexo/introspect.py
miot-harness/src/miot_harness/integrations/nexo/tool_factory.py
miot-harness/src/miot_harness/py.typed
miot-harness/src/miot_harness/runtime/context.py
miot-harness/src/miot_harness/runtime/nexo_graph.py
miot-harness/src/miot_harness/runtime/permissions.py
miot-harness/src/miot_harness/runtime/run_store.py
miot-harness/src/miot_harness/storytelling/contracts.py
miot-harness/src/miot_harness/storytelling/module.py
miot-harness/src/miot_harness/tools/dashboard.py
miot-harness/src/miot_harness/utils/truncation.py
miot-harness/tests/integrations/nexo/test_boot.py
miot-harness/tests/integrations/nexo/test_introspect.py
miot-harness/tests/integrations/nexo/test_introspect_pg.py
miot-harness/tests/integrations/nexo/test_pool.py
miot-harness/tests/integrations/nexo/test_primer.py
miot-harness/tests/integrations/nexo/test_tool_factory.py
miot-harness/tests/test_chat_models.py
miot-harness/tests/test_config.py
miot-harness/tests/test_critic.py
miot-harness/tests/test_data_fetcher.py
miot-harness/tests/test_domain_analyst.py
miot-harness/tests/test_events.py
miot-harness/tests/test_filter_expert.py
miot-harness/tests/test_freshness_judge.py
miot-harness/tests/test_nexo_graph.py
miot-harness/tests/test_nexo_supervisor.py
miot-harness/tests/test_plan.py
miot-harness/tests/test_router.py
miot-harness/tests/test_server_lifespan.py
miot-harness/tests/test_supervisor_nexo_branch.py
miot-harness/tests/test_tool_registry.py

💤 Files with no reviewable changes (11)

miot-harness/src/miot_harness/init.py
miot-harness/src/miot_harness/agents/init.py
miot-harness/src/miot_harness/api/init.py
miot-harness/src/miot_harness/runtime/run_store.py
miot-harness/src/miot_harness/storytelling/module.py
miot-harness/tests/test_server_lifespan.py
miot-harness/tests/test_tool_registry.py
miot-harness/src/miot_harness/runtime/permissions.py
miot-harness/src/miot_harness/runtime/context.py
miot-harness/src/miot_harness/cli.py
miot-harness/src/miot_harness/storytelling/contracts.py

📝 Walkthrough

Walkthrough

This pull request applies comprehensive refactoring and enhancements across the miot-harness codebase, including type-annotation improvements for runtime safety, API-key security wrapping with SecretStr, new methods on core classes (PermissionResult.deny(), JsonRunStore.load()), a new CLI demo entrypoint, expanded storytelling data structures, and widespread formatting/readability improvements across agent nodes and tests.

Changes

Type Safety, Runtime Enhancements, and Code Refactoring

Layer / File(s)	Summary
Type-Checking Configuration `pyproject.toml`	Adds `tool.mypy.overrides` to suppress missing-import warnings for `asyncpg` and `yaml` modules.
Type Annotations & Runtime Typing `src/miot_harness/api/server.py`, `src/miot_harness/integrations/nexo/tool_factory.py`, `src/miot_harness/runtime/nexo_graph.py`	Strengthens type annotations: `_make_lifespan` now declares explicit return type `Callable[[FastAPI], AbstractAsyncContextManager[None]]`; `build_nexo_tool` return type tightened to `HarnessTool[BaseModel, BaseModel]`; `build_nexo_graph` explicitly returns `Any`; node wrappers in nexo_graph use typed signatures with `cast()` at call sites.
Security Enhancement `src/miot_harness/agents/chat_models.py`	Wraps Anthropic and OpenAI API keys with `pydantic.SecretStr` when constructing chat model clients; error messages reformatted to single-line strings.
Runtime API Additions `src/miot_harness/runtime/permissions.py`, `src/miot_harness/runtime/run_store.py`	`PermissionResult.deny(reason: str)` class method creates deny decisions; `JsonRunStore.load(run_id: str)` loads and validates run records from disk.
Data Model Expansion `src/miot_harness/storytelling/contracts.py`	`StoryArtifact` adds two new optional list fields: `widget_drafts` and `approval_proposals`.
CLI Entrypoint `src/miot_harness/cli.py`	New `main()` CLI with `demo` subcommand that builds a harness, executes it with a `UserRequest`, and prints the result as JSON; includes `if __name__ == "__main__"` guard.
Agent Node Refactoring `src/miot_harness/agents/{chat_models,critic,deepagents_adapter,domain_analyst,filter_expert,freshness_judge,summarizer,synthesizer}.py`	Message construction and model invocations reformatted to multi-line styles for improved readability; control flow and logic remain unchanged.
Integration & Utility Refactoring `src/miot_harness/integrations/nexo/{boot,introspect,tool_factory}.py`, `src/miot_harness/tools/dashboard.py`, `src/miot_harness/utils/truncation.py`, `src/miot_harness/evals/run_golden.py`	Import normalization, log message formatting, string slice spacing adjustments, and intermediate variable assignments for clarity; no behavioral changes.
Package Initialization `src/miot_harness/__init__.py`, `src/miot_harness/agents/__init__.py`, `src/miot_harness/api/__init__.py`	Module-level docstrings added or retained; no exported declarations changed.
Test Expectations Tightening `tests/test_chat_models.py`, `tests/test_config.py`, `tests/test_events.py`, `tests/integrations/nexo/test_tool_factory.py`	Tests now expect `pydantic.ValidationError` instead of generic `Exception` for validation failures; autouse fixture clears settings cache to prevent test cross-contamination.
Test Formatting & Data Updates `tests/{test_critic,test_data_fetcher,test_domain_analyst,test_filter_expert,test_freshness_judge,test_nexo_graph,test_nexo_supervisor,test_plan,test_router,test_server_lifespan,test_supervisor_nexo_branch}.py`, `tests/integrations/nexo/{test_boot,test_introspect,test_introspect_pg,test_pool,test_primer,test_tool_factory}.py`	Multi-line formatting for function calls, fixture construction, and assertions; `test_tool_registry.py` updates expected tool list to include `"get_delivery_compliance_metrics"` and `"get_workflow_bottlenecks"`; `# noqa` directives added for import-ordering exceptions.
Bugfix: Storytelling Module `src/miot_harness/storytelling/module.py`	Closes incomplete `progress(HarnessEvent(...))` call in `create_delivery_compliance_story` before return.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through code so neat,
With types now strong and secrets sweet,
New CLI trails and cleaner views,
No logic changed, just styled anew!
hops 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 29.07% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'harness(cleanup): unblock T00 pre-flight (format + py.typed + strict mypy + tests)' clearly and specifically summarizes the main objective of the PR: code cleanup to enable pre-flight testing with formatting, type-checking, and test improvements.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

korutx merged commit 1fd0068 into microboxlabs:trunk May 10, 2026
1 check passed

korutx mentioned this pull request May 10, 2026

harness: deploy stack — Dockerfile, CI workflow, deploy evals (T10b verify) #447

Merged

5 tasks

coderabbitai Bot mentioned this pull request May 15, 2026

harness(phase-13): per-agent telemetry + agentic search foundation #462

Merged

8 tasks

coderabbitai Bot mentioned this pull request May 23, 2026

Feat/harness sse rich events #516

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harness(cleanup): unblock T00 pre-flight (format + py.typed + strict mypy + tests)#446

harness(cleanup): unblock T00 pre-flight (format + py.typed + strict mypy + tests)#446
korutx merged 1 commit into
microboxlabs:trunkfrom
odtorres:harness-cleanup-preflight

korutx commented May 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

korutx commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Verification

Out of scope

Reviewer notes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

korutx commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading