Parser recursion limit by samuelcolvin · Pull Request #24810 · astral-sh/ruff

samuelcolvin · 2026-04-24T04:23:47Z

Summary

Partial fix for #22930.

Without this malicious or machine generated code could cause a stack overflow with something as simple as '(' * 5000 + '1' + ')' * 5000.

I decided to do the simplest thing and have a limit that's always applied with a reasonable default. Since:

the overhead of this check will be tiny
it seems inconceivable that anyone will want to have no limit

Test Plan

PR includes tests.

samuelcolvin · 2026-04-24T09:49:10Z

Hey, could someone please kick off CI for this.

Also, FWIW I have this working with monty and avoiding stack overflows both in AST parsing for the bytecode compiler and type checking in pydantic/monty#391.

astral-sh-bot · 2026-04-24T13:24:02Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser · 2026-04-24T15:54:20Z

Thank you.

This is an improvement, but I'm not convinced it is the proper fix; it only moves the needle on for which programs the parser aborts. But it isn't sufficient, e.g., to protect against allocation failures because the program's too large.

I also checked, and neither TypeScript nor Rust implements the same treatment. Instead, the common approach across parsers is to:

Rewrite the recursion to a loop. This achieves the same degree of protection as what's proposed in this PR, but without arbitrarily truncating the AST.
Use a library to dynamically grow the stack by spilling to the heap, in places where rewriting to a loop isn't possible.

In the end, protecting against denial-of-service attacks isn't specific to stack overflows. The same protection must be in place to handle the exploitation of bugs (in the parser or elsewhere). Which is why I wouldn't consider this a security bug (it certainly adds a few more guardrails, but it doesn't prevent them).

samuelcolvin · 2026-04-24T17:03:21Z

I get where you're coming from, but the fact is if you limit the code length, stack overflow is one of the only DOS risks in the parser.

@zanieb suggested you don't have the bandwidth to rewrite the recursion to a loop, and I certainly don't - so the choice is between adding this improvement, and not adding this improvement.

I'd therefore really appreciate it if you accepted this improvement. But I don't get it if you're willing to merge, I'll just use ruff crates from my branch and attempt to keep it up to date.

(If you are considering rewriting the parser to a loop, please consider making it available as an iterator so we can avoid the overhead of allocating before the first IR)

MichaReiser · 2026-04-24T18:28:29Z

(If you are considering rewriting the parser to a loop, please consider making it available as an iterator so we can avoid the overhead of allocating before the first IR)

I think there was some misunderstanding of what "rewriting" to a loop means. I'm not suggesting that we rewrite the parser to a loop. Instead, the idea is to unroll the recursion by using a loop, similar to what we do in parse_binary_expression_or_higher_recursive. We should be able to rewrite them one by one by, starting with expression_lhs is probably the most important in terms of handling "real world code". However, we'd have to rewrite all of them to mitigate the DOS concerns (although there's no guarantee that the parser won't OOM when parsing a 4GB file that mainly consists of statements.)

MichaReiser · 2026-04-27T10:15:34Z

I'm fine going ahead with this if we address the following issues:

@dhruvmanila mentioned that CPython has a similar limit for binary expressions. We should align our cut-off point with CPython's, or at least ensure it's not lower than CPython's.
Instead of using ..., we should use our normal error recovery node. For expressions, it's an empty identifier with the Invalid context.
We should safeguard against failing to restore the recursion depth. What I'd do is to change enter_recursion to return a RecursionScope struct that holds a DropBomb (a debug-only bomb seems fine?). The bomb needs to be defused by explicitly calling RecursionScope::exit(parser) (consumes self).
This PR does not fix Handle parser stack overflows more gracefully #22930. Instead, we should document that the recursion limit is temporary and the proper solution is to unroll the recursion by using a loop.

samuelcolvin · 2026-04-27T10:30:59Z

great, I'll get those things fixed as soon as I have time.

samuelcolvin · 2026-04-30T13:24:21Z

Claude says cpython uses 201, I'll use that

CPython 3.14 nesting cutoffs

Pattern	First fail	Error
`parens`	201	`SyntaxError: too many nested parentheses`
`lists`	201	`SyntaxError: too many nested parentheses`
`binary_paren`	201	`SyntaxError: too many nested parentheses`
`match_pattern`	201	`SyntaxError: too many nested parentheses`
`fstring`	150	`SyntaxError: too many nested f-strings or t-strings`
`nested_def`	100	`IndentationError: too many levels of indentation`

How this was measured

scripts/parse_recursion_check.py generates a Python file with the chosen
pattern at the requested depth and invokes CPython to compile it:

#!/usr/bin/env python3
"""Generate deeply nested Python source and check how CPython handles it.

Mirrors the patterns covered by the parser-recursion-limit tests so we can
compare Ruff's behaviour with what CPython's own parser/compiler accepts.

Usage:
    python scripts/parse_recursion_check.py <pattern> <depth> [--run]

Patterns:
    parens          ((((1))))
    lists           [[[[1]]]]
    binary_paren    1+(1+(1+(1)))
    nested_def      def f(): def f(): ... pass
    match_pattern   match x: case ((((y)))): pass
    fstring         f"{f"{f"{1}"}"}"

By default the generated source is written to a temp file and CPython is
invoked with ``python -c "compile(open(path).read(), path, 'exec')"`` so we
exercise the parser without executing the code. Pass ``--run`` to actually
``exec`` it.
"""

from __future__ import annotations

import argparse
import subprocess
import sys
import tempfile
from pathlib import Path


def gen_parens(depth: int) -> str:
    return "(" * depth + "1" + ")" * depth + "\n"


def gen_lists(depth: int) -> str:
    return "[" * depth + "1" + "]" * depth + "\n"


def gen_binary_paren(depth: int) -> str:
    return "1+(" * depth + "1" + ")" * depth + "\n"


def gen_nested_def(depth: int) -> str:
    lines = []
    for i in range(depth):
        lines.append("\t" * i + "def f():")
    lines.append("\t" * depth + "pass")
    return "\n".join(lines) + "\n"


def gen_match_pattern(depth: int) -> str:
    return "match x:\n case " + "(" * depth + "y" + ")" * depth + ": pass\n"


def gen_fstring(depth: int) -> str:
    # f"{ f"{ ... f"{1}" ... }" }"
    return 'f"' + '{f"' * (depth - 1) + '{1}' + '"}' * (depth - 1) + '"\n'


GENERATORS = {
    "parens": gen_parens,
    "lists": gen_lists,
    "binary_paren": gen_binary_paren,
    "nested_def": gen_nested_def,
    "match_pattern": gen_match_pattern,
    "fstring": gen_fstring,
}


def main() -> int:
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("pattern", choices=sorted(GENERATORS))
    parser.add_argument("depth", type=int)
    parser.add_argument(
        "--run",
        action="store_true",
        help="exec the file instead of just compiling it",
    )
    parser.add_argument(
        "--out",
        type=Path,
        help="write the generated source to this path (default: a temp file)",
    )
    parser.add_argument(
        "--print-only",
        action="store_true",
        help="just write the file and print its path; don't invoke Python",
    )
    parser.add_argument(
        "--python",
        default=sys.executable,
        help="python executable to invoke (default: the current one)",
    )
    args = parser.parse_args()

    src = GENERATORS[args.pattern](args.depth)

    if args.out is not None:
        path = args.out
        path.write_text(src)
    else:
        tmp = tempfile.NamedTemporaryFile(
            mode="w", suffix=".py", delete=False, prefix=f"recur_{args.pattern}_"
        )
        tmp.write(src)
        tmp.close()
        path = Path(tmp.name)

    print(f"wrote {len(src)} bytes to {path}", file=sys.stderr)

    if args.print_only:
        print(path)
        return 0

    if args.run:
        cmd = [args.python, str(path)]
    else:
        # Compile-only: parses + compiles but does not execute the body.
        cmd = [
            args.python,
            "-c",
            f"import sys; "
            f"src = open({str(path)!r}).read(); "
            f"compile(src, {str(path)!r}, 'exec'); "
            f"print('ok', file=sys.stderr)",
        ]

    print(f"running: {' '.join(cmd)}", file=sys.stderr)
    proc = subprocess.run(cmd, capture_output=True, text=True)
    if proc.stdout:
        sys.stdout.write(proc.stdout)
    if proc.stderr:
        sys.stderr.write(proc.stderr)
    print(f"exit code: {proc.returncode}", file=sys.stderr)
    return proc.returncode


if __name__ == "__main__":
    sys.exit(main())

A bash binary search over each pattern finds the boundary:

for pat in parens lists binary_paren match_pattern fstring nested_def; do
  lo=1; hi=300
  while [ $((hi - lo)) -gt 1 ]; do
    mid=$(( (lo + hi) / 2 ))
    res=$(python3 scripts/parse_recursion_check.py "$pat" "$mid" 2>&1 \
      | grep -E "^(ok|SyntaxError|RecursionError|IndentationError|MemoryError)" \
      | head -1)
    if [ "$res" = "ok" ]; then lo=$mid; else hi=$mid; fi
  done
  echo "$pat: max ok=$lo, first fail=$hi"
done

samuelcolvin · 2026-04-30T13:56:26Z

@MichaReiser I think I've covered all your points.

MichaReiser · 2026-04-30T14:07:14Z

Thank you and thanks for the table. Do you know if CPython tracks separate counts, e.g. allows 200 nested parentheses and 200 nested lists but fails on the 201st nested parentheses? Or is it what you implemented in this PR, a global recursion limit?

samuelcolvin · 2026-04-30T14:48:03Z

CPython has a single shared counter for all open delimiters ((), [], {}), not separate per-construct counts.

So CPython is basically the same as ruff now, accept Ruff's counter also covers statement nesting (def
f(): def f(): …), match patterns, and f-string format specs whereas Cpython has a separate counter for indentation depth.

samuelcolvin · 2026-05-06T00:19:33Z

Hey, @MichaReiser anything else you need here?

MichaReiser · 2026-05-06T12:43:30Z

There's nothing that I need from you. I just need to find time to review the change (the entire team is traveling this week)

codspeed-hq · 2026-05-12T09:32:41Z

Merging this PR will not alter performance

✅ 117 untouched benchmarks

_{Comparing samuelcolvin:parse-recursion-limit (6f33fce) with main (c189569)}

MichaReiser · 2026-05-12T11:21:53Z

Hmm, the performance regression is an issue and I consider a blocker. The PR also doesn't correctly handle right-recursive expressions like x = 1**1**1**...

I'm leaning towards leaving it as is.

astral-sh-bot · 2026-05-18T20:54:07Z

Typing conformance results

No changes detected ✅

Current numbers

The percentage of diagnostics emitted that were expected errors held steady at 89.36%. The percentage of expected errors that received a diagnostic held steady at 85.49%. The number of fully passing files held steady at 88/134.

astral-sh-bot · 2026-05-18T20:55:30Z

Memory usage report

Memory usage unchanged ✅

astral-sh-bot · 2026-05-18T20:57:08Z

`ecosystem-analyzer` results

No diagnostic changes detected ✅

Full report with detailed diff (timing results)

MichaReiser · 2026-05-18T21:48:11Z

Thank you. This change still regress performance by 2-3%, and I'm not sure this rare edge case is worth trading on perf.

zanieb · 2026-05-18T21:58:09Z

I think it'd be a loss if they need to run on a fork to prevent stack overflows. I'm not sure what the solution for the performance is though.

charliermarsh · 2026-05-18T22:10:39Z

I'll ask Codex to give it a try.

samuelcolvin · 2026-05-18T22:32:35Z

Thank you. This change still regress performance by 2-3%, and I'm not sure this rare edge case is worth trading on perf.

I don't think that makes much sense, given you can generate a stack overflow as easily as

uvx python -c "open('stack_overflow.py', 'w').write('[' * 5000)" && uvx ty check stack_overflow.py

or

uvx python -c "open('stack_overflow.py', 'w').write('[' * 5000)" && uvx ruff check stack_overflow.py

samuelcolvin · 2026-05-19T12:50:36Z

If you really want to avoid the overhead in your tools. I can re-implement the depth check as a generic, where you can use a no-op variant.

Let me know if you'd prefer that?

charliermarsh · 2026-05-19T13:12:34Z

Na I think that would be a worse outcome. We should find a way to fix this at zero cost. I'm working on it in the background.

MichaReiser · 2026-05-20T10:12:51Z

I think it's worth fixing. I don't think it's a priority for us to fix this

samuelcolvin · 2026-05-20T10:48:41Z

How can this not be a priority? Stack overflow and hard crash of the process is trivial to cause with one (long) line of code.

Cloud/managed execution of ruff and ty is only going to increase, so solving these sorts of vulnerability should surely be a priority.

Apart from that, for my use case of the crate, stack overflow is a major DOS risk.

MichaReiser · 2026-05-20T10:53:50Z

Only fixing this in the parser will not be sufficient for Ruff or ty. All visitor code can run into stack overflows, even after limiting the depth in the parser. ty has a lot of recursive calls, again, fixing those requires an approach other than the change here in the parser. It also is simply not as high a priority as a stable ty release. I'm not saying it isn't important, it's just not as important as some other work.

samuelcolvin · 2026-05-20T11:37:34Z

Okay, we can agree to disagree on the priority.

Who (who's AI) should fix the comments above, me or @charliermarsh? I'm a bit unclear who this sits with now.

We're intending to relaunch hackmonty.com next week, so we need to get something into monty by then, but it could be on a branch.

charliermarsh · 2026-05-20T11:53:24Z

I discussed with Micha. I will address the comments and see through to merging, then look into some of the longer-term alternatives.

samuelcolvin · 2026-05-20T12:00:44Z

great, thank you.

Swapped the EllipsisLiteral placeholder in parse_lhs_expression for the standard expression-recovery node - an ExprName with id: Name::empty() and ctx: ExprContext::Invalid - built inline rather than via parse_missing_name (which would add a redundant "Expected an identifier" error on top of the RecursionLimitExceeded that enter_recursion already records).

samuelcolvin · 2026-05-21T08:58:30Z

Awesome. Thanks everyone for helping on this!

## Summary Partial fix for astral-sh#22930. Without this malicious or machine generated code could cause a stack overflow with something as simple as `'(' * 5000 + '1' + ')' * 5000`. I decided to do the simplest thing and have a limit that's always applied with a reasonable default. Since: * the overhead of this check will be tiny * it seems inconceivable that anyone will want to have no limit ## Test Plan PR includes tests. --------- Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>

samuelcolvin requested review from MichaReiser and dhruvmanila as code owners April 24, 2026 04:23

samuelcolvin mentioned this pull request Apr 24, 2026

Ruff depth limit pydantic/monty#391

Merged

MichaReiser added the parser Related to the parser label Apr 24, 2026

samuelcolvin force-pushed the parse-recursion-limit branch from af72bbf to 7b61191 Compare April 30, 2026 13:52

charliermarsh force-pushed the parse-recursion-limit branch from 538ed32 to b02bcaa Compare May 19, 2026 22:08

charliermarsh self-assigned this May 20, 2026

samuelcolvin and others added 15 commits May 20, 2026 14:08

add recursion limit for parsing, fix astral-sh#22930

4a0551e

f-string recursion and switch to depth_remaining

3bf675d

fix comments and tests

c9fe5ce

remove public export

497ce75

reduce DEFAULT_MAX_RECURSION_DEPTH to 200

012c454

improve error message

2ef5477

default limit matches cpython

7058d5a

drop bomb

c327644

fix for recursion in parse_binary_expression_or_higher

8115ac5

protect from lambda recursion and orelse recursion

29549f7

Fix performance

2a4fe2b

Use correct invalid nodes

4b42dbd

Unroll list

a11ddf6

Add test and TODO

6f33fce

charliermarsh force-pushed the parse-recursion-limit branch from b02bcaa to 6f33fce Compare May 20, 2026 13:15

charliermarsh merged commit c423054 into astral-sh:main May 21, 2026
52 checks passed

BrewTestBot mentioned this pull request May 21, 2026

ruff 0.15.14 Homebrew/homebrew-core#284070

Merged

samuelcolvin mentioned this pull request May 29, 2026

[parser] Bound iterative expression chains to avoid stack overflow #25462

Closed

MichaReiser mentioned this pull request May 29, 2026

Spill stack to the heap to support arbitrarily deep nested programs #25464

Open

Conversation

samuelcolvin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

samuelcolvin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astral-sh-bot Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

MichaReiser commented Apr 24, 2026

Uh oh!

samuelcolvin commented Apr 24, 2026

Uh oh!

MichaReiser commented Apr 24, 2026

Uh oh!

MichaReiser commented Apr 27, 2026

Uh oh!

samuelcolvin commented Apr 27, 2026

Uh oh!

samuelcolvin commented Apr 30, 2026

Uh oh!

samuelcolvin commented Apr 30, 2026

Uh oh!

MichaReiser commented Apr 30, 2026

Uh oh!

samuelcolvin commented Apr 30, 2026

Uh oh!

samuelcolvin commented May 6, 2026

Uh oh!

MichaReiser commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

MichaReiser commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astral-sh-bot Bot commented May 18, 2026

Typing conformance results

No changes detected ✅

Uh oh!

astral-sh-bot Bot commented May 18, 2026

Memory usage report

Uh oh!

astral-sh-bot Bot commented May 18, 2026

ecosystem-analyzer results

Uh oh!

MichaReiser commented May 18, 2026

Uh oh!

zanieb commented May 18, 2026

Uh oh!

charliermarsh commented May 18, 2026

Uh oh!

samuelcolvin commented May 18, 2026

Uh oh!

samuelcolvin commented May 19, 2026

Uh oh!

charliermarsh commented May 19, 2026

Uh oh!

MichaReiser commented May 20, 2026

Uh oh!

samuelcolvin commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaReiser commented May 20, 2026

Uh oh!

samuelcolvin commented May 20, 2026

Uh oh!

charliermarsh commented May 20, 2026

Uh oh!

samuelcolvin commented Apr 24, 2026 •

edited

Loading

samuelcolvin commented Apr 24, 2026 •

edited

Loading

astral-sh-bot Bot commented Apr 24, 2026 •

edited

Loading

`ruff-ecosystem` results

MichaReiser commented May 6, 2026 •

edited

Loading

codspeed-hq Bot commented May 12, 2026 •

edited

Loading

MichaReiser commented May 12, 2026 •

edited

Loading

`ecosystem-analyzer` results

samuelcolvin commented May 20, 2026 •

edited

Loading