Skip to content

harness(cleanup): unblock T00 pre-flight (format + py.typed + strict mypy + tests)#446

Merged
korutx merged 1 commit into
microboxlabs:trunkfrom
odtorres:harness-cleanup-preflight
May 10, 2026
Merged

harness(cleanup): unblock T00 pre-flight (format + py.typed + strict mypy + tests)#446
korutx merged 1 commit into
microboxlabs:trunkfrom
odtorres:harness-cleanup-preflight

Conversation

@korutx

@korutx korutx commented May 9, 2026

Copy link
Copy Markdown
Contributor

Why

Pre-existing failures on freshly-merged trunk (#445) would block any future
pre-flight gate that runs ruff / mypy / pytest -q as hard gates — for
example the upcoming RALPH-LOOP T00 step before the deploy work in plan
`13-server-deployment`.

Failing checks observed on trunk before this PR:

  • `ruff check src tests` → 52 errors (12 auto-fixable)
  • `uv run mypy` → fails at the package marker check (`Package 'miot_harness' cannot be type checked due to missing py.typed`)
  • `uv run pytest -q` → 1 failed (`test_get_chat_model_returns_openai_for_gpt` — `get_settings` was `@lru_cache`'d, so monkeypatch'd env vars after cache-warming had no effect)

What

Single-commit cleanup. No behavior changes. Scoped to `miot-harness/`.

  1. `ruff format` across the package (39 files, pure whitespace).
  2. Lint residue — manually reflowed 8 long lines, tightened 4 broad `pytest.raises(Exception)` to specific types (`RuntimeError`, `pydantic.ValidationError`), `# noqa: E402` on 2 intentionally-late conditional imports.
  3. `py.typed` marker added so mypy actually inspects the package. This surfaced 32 latent strict-mode errors, all fixed in the same commit:
    • `pyproject.toml`: `ignore_missing_imports` for `asyncpg` + `yaml`.
    • `chat_models.py`: wrap api keys in `pydantic.SecretStr` for `ChatAnthropic` / `ChatOpenAI`.
    • `api/server.py`: type the lifespan factory + inner async lifespan.
    • `evals/run_golden.py`: annotate the `_check` / `_call` closures.
    • `integrations/nexo/tool_factory.py`: widen `common` to `dict[str, Any]`, cast `create_model()` returns to `type[BaseModel]`, parameterize `HarnessTool[BaseModel, BaseModel]`, drop one stale `# type: ignore`.
    • `runtime/nexo_graph.py`: every node closure now `(NexoState) -> dict[str, Any]`, `cast` to `dict` at the seam to downstream node functions; route map coerced to `dict[Hashable, str]`.
  4. Test fixture for `get_settings.lru_cache` in `test_chat_models.py` (autouse `cache_clear` before/after each test). Removed redundant manual `cache_clear()` calls. Tightened the missing-key test to `raises(RuntimeError)`.

Verification

All on this branch:

  • `uv run ruff check src tests` → All checks passed.
  • `uv run mypy` → Success: no issues found in 51 source files.
  • `uv run pytest -q` → 139 passed, 1 skipped.

Out of scope

  • No Dockerfile, no CI workflow YAML, no deploy work — those land in follow-ups per `13-server-deployment/`.
  • No new dependencies. `uv.lock` is unchanged.
  • No public API changes.

Reviewer notes

Diff is large (49 files) but ~80% is pure `ruff format` whitespace. The semantic substance is in 15 files:

  • `pyproject.toml`, `src/miot_harness/py.typed`, `agents/chat_models.py`, `api/server.py`, `evals/run_golden.py`, `integrations/nexo/tool_factory.py`, `runtime/nexo_graph.py`
  • `tests/test_chat_models.py`, `tests/test_config.py`, `tests/test_events.py`, `tests/integrations/nexo/test_tool_factory.py`, `tests/integrations/nexo/test_introspect_pg.py`, `tests/test_filter_expert.py`
  • plus minor reflows in `agents/freshness_judge.py` and `integrations/nexo/boot.py`

Suggested review order: read the commit message (it's structured by the four categories above), then look at each semantic file. Skip the format-only ones unless the formatter output looks suspicious.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added CLI entrypoint with demo subcommand for testing.
    • Added widget_drafts and approval_proposals fields to story artifacts.
    • Added get_delivery_compliance_metrics and get_workflow_bottlenecks tools.
    • Added JsonRunStore.load() method to retrieve run records.
    • Added PermissionResult.deny() class method for permission handling.
  • Bug Fixes

    • Improved API key handling with secure string wrapping.
  • Improvements

    • Enhanced type annotations throughout the codebase.

…mypy + test fixes)

Pre-existing failures on freshly-merged trunk would block any future
pre-flight gate (e.g. RALPH-LOOP T00) that runs ruff/mypy/pytest as
hard gates. This PR makes all three green, scoped to miot-harness/
only, no behavior changes.

What changed (categorized):

1. ruff format applied across the package
   - 39 files reformatted; pure whitespace/wrapping, no semantic change.

2. ruff lint — manual fixes for residue after format
   - 8 long lines reflowed (E501) in chat_models.py, freshness_judge.py,
     api/server.py, evals/run_golden.py, integrations/nexo/boot.py,
     tests/test_filter_expert.py.
   - 4 over-broad pytest.raises(Exception) tightened to specific types
     (RuntimeError, ValidationError) in test_chat_models.py,
     test_config.py, test_events.py, test_tool_factory.py.
   - 2 conditional imports in test_introspect_pg.py marked # noqa: E402
     (intentional — they sit after pytest.skip(allow_module_level=True)).

3. mypy strict mode — was silently bailing at the package marker check
   - Added src/miot_harness/py.typed so mypy actually inspects the
     package. This surfaced 32 latent strict-mode errors which are all
     fixed in this commit.
   - pyproject.toml: ignore_missing_imports for asyncpg + yaml
     (third-party, no stubs).
   - chat_models.py: wrap api_key strings in pydantic.SecretStr to
     satisfy ChatAnthropic/ChatOpenAI signatures.
   - api/server.py: typed _make_lifespan and the inner async lifespan.
   - evals/run_golden.py: annotated _check/_call closures with the
     HarnessTool callback signatures.
   - integrations/nexo/tool_factory.py: widened common dict to
     dict[str, Any] (was over-narrowed by literal-inference); cast
     create_model() return to type[BaseModel]; parameterised
     HarnessTool[BaseModel, BaseModel]; removed an unused
     # type: ignore.
   - runtime/nexo_graph.py: typed every node closure as
     (NexoState) -> dict[str, Any], casting to dict at the seam to
     downstream node functions; typed route(); coerced _ROUTE_MAP into
     dict[Hashable, str] for StateGraph.add_conditional_edges.

4. test fixture — settings.lru_cache vs monkeypatch
   - test_chat_models.py: get_settings() is @lru_cache, so monkeypatch'd
     env vars after the cache is warm have no effect. Added an autouse
     fixture that clears the cache before/after each test in this file.
     Removed redundant manual cache_clear() calls. Tightened the missing-
     ANTHROPIC_API_KEY test from raises(Exception) to raises(RuntimeError).

Verification (all on this branch):
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 139 passed, 1 skipped.

Out of scope: Dockerfile, CI workflow, deploy plan — those land in
follow-up PRs per .cursor/plans/ai-first/13-server-deployment/.
@coderabbitai

coderabbitai Bot commented May 9, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fa2eb50-8d67-450a-9428-e18bae0db204

📥 Commits

Reviewing files that changed from the base of the PR and between 787375d and a8b62f1.

📒 Files selected for processing (48)
  • miot-harness/pyproject.toml
  • miot-harness/src/miot_harness/__init__.py
  • miot-harness/src/miot_harness/agents/__init__.py
  • miot-harness/src/miot_harness/agents/chat_models.py
  • miot-harness/src/miot_harness/agents/critic.py
  • miot-harness/src/miot_harness/agents/deepagents_adapter.py
  • miot-harness/src/miot_harness/agents/domain_analyst.py
  • miot-harness/src/miot_harness/agents/filter_expert.py
  • miot-harness/src/miot_harness/agents/freshness_judge.py
  • miot-harness/src/miot_harness/agents/summarizer.py
  • miot-harness/src/miot_harness/agents/synthesizer.py
  • miot-harness/src/miot_harness/api/__init__.py
  • miot-harness/src/miot_harness/api/server.py
  • miot-harness/src/miot_harness/cli.py
  • miot-harness/src/miot_harness/evals/run_golden.py
  • miot-harness/src/miot_harness/integrations/nexo/boot.py
  • miot-harness/src/miot_harness/integrations/nexo/introspect.py
  • miot-harness/src/miot_harness/integrations/nexo/tool_factory.py
  • miot-harness/src/miot_harness/py.typed
  • miot-harness/src/miot_harness/runtime/context.py
  • miot-harness/src/miot_harness/runtime/nexo_graph.py
  • miot-harness/src/miot_harness/runtime/permissions.py
  • miot-harness/src/miot_harness/runtime/run_store.py
  • miot-harness/src/miot_harness/storytelling/contracts.py
  • miot-harness/src/miot_harness/storytelling/module.py
  • miot-harness/src/miot_harness/tools/dashboard.py
  • miot-harness/src/miot_harness/utils/truncation.py
  • miot-harness/tests/integrations/nexo/test_boot.py
  • miot-harness/tests/integrations/nexo/test_introspect.py
  • miot-harness/tests/integrations/nexo/test_introspect_pg.py
  • miot-harness/tests/integrations/nexo/test_pool.py
  • miot-harness/tests/integrations/nexo/test_primer.py
  • miot-harness/tests/integrations/nexo/test_tool_factory.py
  • miot-harness/tests/test_chat_models.py
  • miot-harness/tests/test_config.py
  • miot-harness/tests/test_critic.py
  • miot-harness/tests/test_data_fetcher.py
  • miot-harness/tests/test_domain_analyst.py
  • miot-harness/tests/test_events.py
  • miot-harness/tests/test_filter_expert.py
  • miot-harness/tests/test_freshness_judge.py
  • miot-harness/tests/test_nexo_graph.py
  • miot-harness/tests/test_nexo_supervisor.py
  • miot-harness/tests/test_plan.py
  • miot-harness/tests/test_router.py
  • miot-harness/tests/test_server_lifespan.py
  • miot-harness/tests/test_supervisor_nexo_branch.py
  • miot-harness/tests/test_tool_registry.py
💤 Files with no reviewable changes (11)
  • miot-harness/src/miot_harness/init.py
  • miot-harness/src/miot_harness/agents/init.py
  • miot-harness/src/miot_harness/api/init.py
  • miot-harness/src/miot_harness/runtime/run_store.py
  • miot-harness/src/miot_harness/storytelling/module.py
  • miot-harness/tests/test_server_lifespan.py
  • miot-harness/tests/test_tool_registry.py
  • miot-harness/src/miot_harness/runtime/permissions.py
  • miot-harness/src/miot_harness/runtime/context.py
  • miot-harness/src/miot_harness/cli.py
  • miot-harness/src/miot_harness/storytelling/contracts.py

📝 Walkthrough

Walkthrough

This pull request applies comprehensive refactoring and enhancements across the miot-harness codebase, including type-annotation improvements for runtime safety, API-key security wrapping with SecretStr, new methods on core classes (PermissionResult.deny(), JsonRunStore.load()), a new CLI demo entrypoint, expanded storytelling data structures, and widespread formatting/readability improvements across agent nodes and tests.

Changes

Type Safety, Runtime Enhancements, and Code Refactoring

Layer / File(s) Summary
Type-Checking Configuration
pyproject.toml
Adds tool.mypy.overrides to suppress missing-import warnings for asyncpg and yaml modules.
Type Annotations & Runtime Typing
src/miot_harness/api/server.py, src/miot_harness/integrations/nexo/tool_factory.py, src/miot_harness/runtime/nexo_graph.py
Strengthens type annotations: _make_lifespan now declares explicit return type Callable[[FastAPI], AbstractAsyncContextManager[None]]; build_nexo_tool return type tightened to HarnessTool[BaseModel, BaseModel]; build_nexo_graph explicitly returns Any; node wrappers in nexo_graph use typed signatures with cast() at call sites.
Security Enhancement
src/miot_harness/agents/chat_models.py
Wraps Anthropic and OpenAI API keys with pydantic.SecretStr when constructing chat model clients; error messages reformatted to single-line strings.
Runtime API Additions
src/miot_harness/runtime/permissions.py, src/miot_harness/runtime/run_store.py
PermissionResult.deny(reason: str) class method creates deny decisions; JsonRunStore.load(run_id: str) loads and validates run records from disk.
Data Model Expansion
src/miot_harness/storytelling/contracts.py
StoryArtifact adds two new optional list fields: widget_drafts and approval_proposals.
CLI Entrypoint
src/miot_harness/cli.py
New main() CLI with demo subcommand that builds a harness, executes it with a UserRequest, and prints the result as JSON; includes if __name__ == "__main__" guard.
Agent Node Refactoring
src/miot_harness/agents/{chat_models,critic,deepagents_adapter,domain_analyst,filter_expert,freshness_judge,summarizer,synthesizer}.py
Message construction and model invocations reformatted to multi-line styles for improved readability; control flow and logic remain unchanged.
Integration & Utility Refactoring
src/miot_harness/integrations/nexo/{boot,introspect,tool_factory}.py, src/miot_harness/tools/dashboard.py, src/miot_harness/utils/truncation.py, src/miot_harness/evals/run_golden.py
Import normalization, log message formatting, string slice spacing adjustments, and intermediate variable assignments for clarity; no behavioral changes.
Package Initialization
src/miot_harness/__init__.py, src/miot_harness/agents/__init__.py, src/miot_harness/api/__init__.py
Module-level docstrings added or retained; no exported declarations changed.
Test Expectations Tightening
tests/test_chat_models.py, tests/test_config.py, tests/test_events.py, tests/integrations/nexo/test_tool_factory.py
Tests now expect pydantic.ValidationError instead of generic Exception for validation failures; autouse fixture clears settings cache to prevent test cross-contamination.
Test Formatting & Data Updates
tests/{test_critic,test_data_fetcher,test_domain_analyst,test_filter_expert,test_freshness_judge,test_nexo_graph,test_nexo_supervisor,test_plan,test_router,test_server_lifespan,test_supervisor_nexo_branch}.py, tests/integrations/nexo/{test_boot,test_introspect,test_introspect_pg,test_pool,test_primer,test_tool_factory}.py
Multi-line formatting for function calls, fixture construction, and assertions; test_tool_registry.py updates expected tool list to include "get_delivery_compliance_metrics" and "get_workflow_bottlenecks"; # noqa directives added for import-ordering exceptions.
Bugfix: Storytelling Module
src/miot_harness/storytelling/module.py
Closes incomplete progress(HarnessEvent(...)) call in create_delivery_compliance_story before return.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through code so neat,
With types now strong and secrets sweet,
New CLI trails and cleaner views,
No logic changed, just styled anew!
hops 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'harness(cleanup): unblock T00 pre-flight (format + py.typed + strict mypy + tests)' clearly and specifically summarizes the main objective of the PR: code cleanup to enable pre-flight testing with formatting, type-checking, and test improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@korutx korutx merged commit 1fd0068 into microboxlabs:trunk May 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant