Skip to content

Factor out the mdtest crate#24616

Merged
ntBre merged 17 commits intomainfrom
brent/mdtest-crate
Apr 16, 2026
Merged

Factor out the mdtest crate#24616
ntBre merged 17 commits intomainfrom
brent/mdtest-crate

Conversation

@ntBre
Copy link
Copy Markdown
Contributor

@ntBre ntBre commented Apr 13, 2026

Summary

This is a first step toward adding mdtests for Ruff. I actually wrote the code
in the opposite order, first copy-pasting ty_test to a ruff_test crate, and then
factoring out the shared code, but I figured it would be easier to review in
this order. I also opened a stacked PR with the ruff_test changes (#24617)
to show that the API works well for that too.

The main change here is moving several of the modules from ty_test to a new
mdtest crate:

  • assertion
  • diagnostic
  • matcher
  • parser

Beyond moving these files to the new crate, I made Matcher functions take a
&dyn Db to support passing a different concrete type from ruff_test, and I
also made the parser generic over an MdtestConfig trait to allow Ruff to use a
separate config struct. I also introduced new TestConfig and TestDb types to allow
testing the matcher and parser within the mdtest crate without depending
on either the real ty Db or ty_test config type.

The lib.rs file from ty_test was essentially split in half, with the shared
code moved to the mdtest crate and the ty-specific parts kept in ty_test.

Test Plan

All existing mdtests and the unit tests from ty_test should still pass, and
the stacked branch with the ruff_test crate tests the split API

Summary
--

This is a first step toward adding mdtests for Ruff. I actually wrote the code
in the opposite order, first copy-pasting `ty_test` to a `ruff_test` crate, and then
factoring out the shared code, but I figured it would be easier to review in
this order. I'll open a stacked PR with the `ruff_test` changes shortly after
this one to show that the API works well for that too.

The main change here is moving several of the modules from `ty_test` to a new
`mdtest` crate:
- assertion.rs
- diagnostic.rs
- matcher.rs
- parser.rs

These files required few changes, with a couple of exceptions noted below.
Unfortunately, this also required moving the `config` and `db` modules to
support the `Matcher` tests. Ideally these would live in `ty_test` instead since
`ruff_test` uses a slightly different `Db` and configuration schema, but again
they are used by the current `Matcher` tests, and that seemed like a bigger
refactor that we could defer to later.

Beyond moving these files to the new crate, I made `Matcher` functions take a
`&dyn Db` to support passing a different concrete type from `ruff_test`, and I
also made the parser generic over an `MdtestConfig` trait to allow Ruff to use a
separate config struct.

The lib.rs file from `ty_test` was essentially split in half, with the shared
code moved to the `mdtest` crate and the ty-specific parts kept in `ty_test`.

Test Plan
--

All existing mdtests and the unit tests from `ty_test` should still pass, and
the stacked branch with the `ruff_test` crate tests the split API
@ntBre ntBre added the testing Related to testing Ruff itself label Apr 13, 2026
@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 13, 2026

Typing conformance results

No changes detected ✅

Current numbers
The percentage of diagnostics emitted that were expected errors held steady at 87.94%. The percentage of expected errors that received a diagnostic held steady at 83.36%. The number of fully passing files held steady at 79/133.

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 13, 2026

Memory usage report

Memory usage unchanged ✅

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 13, 2026

ecosystem-analyzer results

No diagnostic changes detected ✅

Full report with detailed diff (timing results)

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Apr 13, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

ntBre added a commit that referenced this pull request Apr 13, 2026
Summary
--

This PR adds the `ruff_test` crate, a parallel crate to `ty_test` for Ruff, to
enable the new mdtests in `ruff_linter`. I opted to follow the `Db`-based
structure of `ty_test` to simplify the integration, but we end up basically just
unpacking the files from the `Db` to call the `test_contents` function from the
linter.

Currently stacked on #24616

Test Plan
--

I copied over UP046_0.py into a new mdtest. This was selected basically at
random, and I should probably either pick a shorter first test to port, or also
port over the other UP046 tests and take better advantage of the mdtest format.
Currently the whole file is just in a single code block, but it demonstrates the
basic functionality, including config loading and diagnostic snapshotting.
ntBre added a commit that referenced this pull request Apr 13, 2026
Summary
--

This PR adds the `ruff_test` crate, a parallel crate to `ty_test` for Ruff, to
enable the new mdtests in `ruff_linter`. I opted to follow the `Db`-based
structure of `ty_test` to simplify the integration, but we end up basically just
unpacking the files from the `Db` to call the `test_contents` function from the
linter.

Currently stacked on #24616

Test Plan
--

I copied over UP046_0.py into a new mdtest. This was selected basically at
random, and I should probably either pick a shorter first test to port, or also
port over the other UP046 tests and take better advantage of the mdtest format.
Currently the whole file is just in a single code block, but it demonstrates the
basic functionality, including config loading and diagnostic snapshotting.
@ntBre ntBre mentioned this pull request Apr 13, 2026
@ntBre ntBre marked this pull request as ready for review April 13, 2026 22:08
Copy link
Copy Markdown
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think we should try to reduce the mdtest's dependencies and make it less dependent on ty_python_semantic.

It's also unclear to me whether the Db concept and abstractions make sense, given that ruff doesn't use a db at all.

Comment thread crates/mdtest/src/config.rs Outdated
#[derive(Deserialize, Debug, Default, Clone)]
#[serde(rename_all = "kebab-case", deny_unknown_fields)]
pub(crate) struct MarkdownTestConfig {
pub struct MarkdownTestConfig {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to have a shared markdown configuration. I consider the supported configuration options to be very specific to a specific test runner. E.g. the analysis options and most environment options are specific to ty and not relevant for ruff. That's why I think that the concrete option type should live in ty_test, but there might be shared options, like the external dependencies that lives in the shared mdtest crate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably chat more about this in our 1:1, but including this config type and the Db and thus the ty_* dependencies were mostly motivated by the tests in the matcher and parser. The matcher tests need a concrete Db type, and the parser tests need a concrete configuration type, so the shortest path to that seemed like just continuing to share them with ty_test, at least for now. The Ruff version has its own configuration and Db types, which would be preferable for ty_test too, as you say.

I'll think more about how to separate these items from the tests.

Comment thread crates/mdtest/Cargo.toml Outdated
Comment on lines +25 to +28
ty_module_resolver = { workspace = true }
ty_python_semantic = { workspace = true, features = ["serde", "testing"] }
ty_vendored = { workspace = true }
ty_python_core = { workspace = true }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to make the generic mdtest crate independent of most ty crates. Not only does this help with faster compile times (compiling ty_python_semantic takes forever), but it also makes it more reusable without introducing cyclic dependencies.

Comment thread crates/mdtest/src/db.rs Outdated
#[salsa::db]
#[derive(Clone)]
pub(crate) struct Db {
pub struct Db {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of defining the db struct here, I think it's better to define a Db trait with the accessors you need. Downstream crates can then define their own Db struct.

I haven't looked at Ruff's implementation but does it even use the Db struct?

Comment thread crates/ty_test/src/lib.rs
/// Run `path` as a markdown test suite with given `title`.
///
/// Panic on test failure, and print failure details.
pub fn run(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could find a way to share some of the run and run_test logic. There's a lot happening in those methods that isn't specific to ty_test. It's probably sufficient if we have a few trait methods like check_file that, given a file, return a list of Diagnostics that are sufficient abstractions. But we can also decide to leave this for a later PR.

@ntBre ntBre marked this pull request as draft April 14, 2026 13:05
@ntBre
Copy link
Copy Markdown
Contributor Author

ntBre commented Apr 14, 2026

It turned out to be a lot easier to break the ty dependency that I had feared/expected. I still need to do another cleanup pass, but I introduced a very minimal TestDb and TestConfig and removed the ty dependencies.

ntBre added a commit that referenced this pull request Apr 14, 2026
Summary
--

This PR adds the `ruff_test` crate, a parallel crate to `ty_test` for Ruff, to
enable the new mdtests in `ruff_linter`. I opted to follow the `Db`-based
structure of `ty_test` to simplify the integration, but we end up basically just
unpacking the files from the `Db` to call the `test_contents` function from the
linter.

Currently stacked on #24616

Test Plan
--

I copied over UP046_0.py into a new mdtest. This was selected basically at
random, and I should probably either pick a shorter first test to port, or also
port over the other UP046 tests and take better advantage of the mdtest format.
Currently the whole file is just in a single code block, but it demonstrates the
basic functionality, including config loading and diagnostic snapshotting.
@ntBre
Copy link
Copy Markdown
Contributor Author

ntBre commented Apr 14, 2026

Thanks for the review and for the conversation in our 1:1 earlier too! I think I've mostly addressed your earlier comments by adding the TestDb and TestConfig mentioned above, which allowed removing the dependencies on ty crates.

It's also unclear to me whether the Db concept and abstractions make sense, given that ruff doesn't use a db at all.

Codex and I are still poking at this part. It also seems more promising to switch to a UnifiedFile/FileResolver setup than I initially thought, so I'll leave this in draft for now.

ntBre added a commit that referenced this pull request Apr 15, 2026
Summary
--

This PR adds the `ruff_test` crate, a parallel crate to `ty_test` for Ruff, to
enable the new mdtests in `ruff_linter`. I opted to follow the `Db`-based
structure of `ty_test` to simplify the integration, but we end up basically just
unpacking the files from the `Db` to call the `test_contents` function from the
linter.

Currently stacked on #24616

Test Plan
--

I copied over UP046_0.py into a new mdtest. This was selected basically at
random, and I should probably either pick a shorter first test to port, or also
port over the other UP046 tests and take better advantage of the mdtest format.
Currently the whole file is just in a single code block, but it demonstrates the
basic functionality, including config loading and diagnostic snapshotting.
@ntBre
Copy link
Copy Markdown
Contributor Author

ntBre commented Apr 15, 2026

Alright, I got something working without a Db on the Ruff side: brent/ruff-mdtests...brent/nodb

I kind of don't mind the Db approach in comparison, but I don't feel strongly either way yet. The changes are easy to stack on top of the other two PRs, so I'll just reopen this one for now.

@ntBre ntBre marked this pull request as ready for review April 15, 2026 13:51
Copy link
Copy Markdown
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This split looks reasonable for a first iteration. It does have the downside that new mdtest feature need to be added to each test runner. For example, the code for capturing and updating snapshot still lives in ty_test. This is probably fine for an initial version but it might be interesting to explore how much work it is to lift more of run and run_test out of ty_test and into the mdtest crate

Comment thread crates/mdtest/src/parser.rs
Comment thread crates/mdtest/src/parser.rs Outdated
Comment on lines 930 to 943
Copy link
Copy Markdown
Member

@MichaReiser MichaReiser Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind this too much, but it feels a bit funny that this happens within the parser. Is this something we could validate within run_test (or run?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this felt funny to me too and required the config trait. I think one of the parser tests asserts on this error, which is one of the reasons I left it here, but I can try to drop or move that test too. Otherwise this would feel a lot better in ty_test since Ruff doesn't have dependencies.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually kind of tricky to validate outside the parser because of the config inheritance from parent Sections. A loop like this over suite.tests(), which I tried first, has false positives because each child section inherits the parent config:

    let mut file_has_dependencies = false;
    for test in suite.tests() {
        if test.configuration().dependencies().is_some() {
            if file_has_dependencies {
                bail!(
                    "Multiple sections with `[project]` dependencies in the same file are not allowed. \
                     External dependencies must be specified in a single top-level configuration block."
                );
            }
            file_has_dependencies = true;
        }
    }

That failed for this test, for example.

Instead of looping over tests, I think we could loop over Sections and add an inherited_config flag, but then we have to expose that type and an iterator method. I opted for passing a callback to the parser to validate each parsed Config instead, but that's also kind of awkward, so I'd be curious to get your thoughts, especially in case I'm missing something obvious.

The callback at least keeps the check directly in ty_test and removes the trait method.

Comment thread crates/mdtest/Cargo.toml
Comment thread crates/ty_test/src/lib.rs

let mut db = db::Db::setup();
let mut db = Db::setup();
let mut markdown_edits = vec![];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine if we leave this to a separate PR, but it would be nice if this run loop could be shared between crates. It would probably allow us to keep many types pub(crate) again

Comment thread crates/ty_test/src/lib.rs
let failure = match matcher::match_file(db, test_file.file, &diagnostics).and_then(
|inline_diagnostics| {
validate_inline_snapshot(
mdtest::validate_inline_snapshot(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I feel like the code iterating over test_files and what comes below is mostly agnostic to what's being tested. It would be nice if some of it coudl be shared.

Comment thread crates/ty_test/src/lib.rs
@@ -744,287 +652,6 @@ impl std::fmt::Display for ModuleInconsistency<'_> {
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to bring attempt_test to mdtest. It gracefully handles the case where checking a file panics (or linting). It ensurs that other tests in the same file run to completion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good idea. I think I'll try to group this with the run/run_test refactor that I'll follow up on. I agree with your points on that front too and have some stashed changes locally that I hope will work.

@MichaReiser MichaReiser assigned MichaReiser and unassigned sharkdp Apr 15, 2026
Comment thread crates/mdtest/src/parser.rs Outdated
@ntBre ntBre merged commit 08c56c8 into main Apr 16, 2026
56 checks passed
@ntBre ntBre deleted the brent/mdtest-crate branch April 16, 2026 14:32
ntBre added a commit that referenced this pull request Apr 16, 2026
Summary
--

This PR adds the `ruff_test` crate, a parallel crate to `ty_test` for Ruff, to
enable the new mdtests in `ruff_linter`. I opted to follow the `Db`-based
structure of `ty_test` to simplify the integration, but we end up basically just
unpacking the files from the `Db` to call the `test_contents` function from the
linter.

Currently stacked on #24616

Test Plan
--

I copied over UP046_0.py into a new mdtest. This was selected basically at
random, and I should probably either pick a shorter first test to port, or also
port over the other UP046 tests and take better advantage of the mdtest format.
Currently the whole file is just in a single code block, but it demonstrates the
basic functionality, including config loading and diagnostic snapshotting.
ntBre added a commit that referenced this pull request Apr 16, 2026
Summary
--

This PR adds the `ruff_test` crate, a parallel crate to `ty_test` for Ruff, to
enable the new mdtests in `ruff_linter`. I opted to follow the `Db`-based
structure of `ty_test` to simplify the integration, but we end up basically just
unpacking the files from the `Db` to call the `test_contents` function from the
linter.

Currently stacked on #24616

Test Plan
--

I copied over UP046_0.py into a new mdtest. This was selected basically at
random, and I should probably either pick a shorter first test to port, or also
port over the other UP046 tests and take better advantage of the mdtest format.
Currently the whole file is just in a single code block, but it demonstrates the
basic functionality, including config loading and diagnostic snapshotting.
ntBre added a commit that referenced this pull request Apr 20, 2026
Summary
--

This PR adds the `ruff_test` crate, a parallel crate to `ty_test` for Ruff, to
enable the new mdtests in `ruff_linter`. I opted to follow the `Db`-based
structure of `ty_test` to simplify the integration, but we end up basically just
unpacking the files from the `Db` to call the `test_contents` function from the
linter.

Currently stacked on #24616

Test Plan
--

I copied over UP046_0.py into a new mdtest. This was selected basically at
random, and I should probably either pick a shorter first test to port, or also
port over the other UP046 tests and take better advantage of the mdtest format.
Currently the whole file is just in a single code block, but it demonstrates the
basic functionality, including config loading and diagnostic snapshotting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Related to testing Ruff itself

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants