harness(cleanup): unblock T00 pre-flight (format + py.typed + strict mypy + tests)#446
Conversation
…mypy + test fixes)
Pre-existing failures on freshly-merged trunk would block any future
pre-flight gate (e.g. RALPH-LOOP T00) that runs ruff/mypy/pytest as
hard gates. This PR makes all three green, scoped to miot-harness/
only, no behavior changes.
What changed (categorized):
1. ruff format applied across the package
- 39 files reformatted; pure whitespace/wrapping, no semantic change.
2. ruff lint — manual fixes for residue after format
- 8 long lines reflowed (E501) in chat_models.py, freshness_judge.py,
api/server.py, evals/run_golden.py, integrations/nexo/boot.py,
tests/test_filter_expert.py.
- 4 over-broad pytest.raises(Exception) tightened to specific types
(RuntimeError, ValidationError) in test_chat_models.py,
test_config.py, test_events.py, test_tool_factory.py.
- 2 conditional imports in test_introspect_pg.py marked # noqa: E402
(intentional — they sit after pytest.skip(allow_module_level=True)).
3. mypy strict mode — was silently bailing at the package marker check
- Added src/miot_harness/py.typed so mypy actually inspects the
package. This surfaced 32 latent strict-mode errors which are all
fixed in this commit.
- pyproject.toml: ignore_missing_imports for asyncpg + yaml
(third-party, no stubs).
- chat_models.py: wrap api_key strings in pydantic.SecretStr to
satisfy ChatAnthropic/ChatOpenAI signatures.
- api/server.py: typed _make_lifespan and the inner async lifespan.
- evals/run_golden.py: annotated _check/_call closures with the
HarnessTool callback signatures.
- integrations/nexo/tool_factory.py: widened common dict to
dict[str, Any] (was over-narrowed by literal-inference); cast
create_model() return to type[BaseModel]; parameterised
HarnessTool[BaseModel, BaseModel]; removed an unused
# type: ignore.
- runtime/nexo_graph.py: typed every node closure as
(NexoState) -> dict[str, Any], casting to dict at the seam to
downstream node functions; typed route(); coerced _ROUTE_MAP into
dict[Hashable, str] for StateGraph.add_conditional_edges.
4. test fixture — settings.lru_cache vs monkeypatch
- test_chat_models.py: get_settings() is @lru_cache, so monkeypatch'd
env vars after the cache is warm have no effect. Added an autouse
fixture that clears the cache before/after each test in this file.
Removed redundant manual cache_clear() calls. Tightened the missing-
ANTHROPIC_API_KEY test from raises(Exception) to raises(RuntimeError).
Verification (all on this branch):
- uv run ruff check src tests → All checks passed.
- uv run mypy → Success: no issues found in 51 source files.
- uv run pytest -q → 139 passed, 1 skipped.
Out of scope: Dockerfile, CI workflow, deploy plan — those land in
follow-up PRs per .cursor/plans/ai-first/13-server-deployment/.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (48)
💤 Files with no reviewable changes (11)
📝 WalkthroughWalkthroughThis pull request applies comprehensive refactoring and enhancements across the ChangesType Safety, Runtime Enhancements, and Code Refactoring
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Why
Pre-existing failures on freshly-merged trunk (#445) would block any future
pre-flight gate that runs
ruff/mypy/pytest -qas hard gates — forexample the upcoming RALPH-LOOP T00 step before the deploy work in plan
`13-server-deployment`.
Failing checks observed on trunk before this PR:
What
Single-commit cleanup. No behavior changes. Scoped to `miot-harness/`.
Verification
All on this branch:
Out of scope
Reviewer notes
Diff is large (49 files) but ~80% is pure `ruff format` whitespace. The semantic substance is in 15 files:
Suggested review order: read the commit message (it's structured by the four categories above), then look at each semantic file. Skip the format-only ones unless the formatter output looks suspicious.
Summary by CodeRabbit
Release Notes
New Features
demosubcommand for testing.widget_draftsandapproval_proposalsfields to story artifacts.get_delivery_compliance_metricsandget_workflow_bottleneckstools.JsonRunStore.load()method to retrieve run records.PermissionResult.deny()class method for permission handling.Bug Fixes
Improvements