Skip to content

XML: harden parser against malformed input crashes#7555

Merged
timtebeek merged 2 commits intomainfrom
tim/xml-parser-fixes
May 5, 2026
Merged

XML: harden parser against malformed input crashes#7555
timtebeek merged 2 commits intomainfrom
tim/xml-parser-fixes

Conversation

@timtebeek
Copy link
Copy Markdown
Member

@timtebeek timtebeek commented May 3, 2026

Summary

Fixes two crashes in XmlParserVisitor when ANTLR's default error strategy synthesizes closing-tag tokens for malformed XML at EOF:

  • advanceCursor no longer throws IndexOutOfBoundsException when a synthesized token would advance past the end of the source — it clamps to source.length() instead.

  • visitElement tolerates any combination of null OPEN(1) / Name(1) / CLOSE(1) tokens so it no longer NPEs when error recovery couldn't produce them.

  • After these changes, all three remaining rewrite-xml parse failures reported in "There were problems parsing" should show line number and/or excerpt of source code #7554 (comment) (MarcXmlParserTestSummaryAndKeywords.xml, MarcXmlParserTestArticle.xml, MedlineImporterTestMalformedEntry.xml) parse to a ParseError that preserves the original source text instead of throwing.

The bare-ampersand and unterminated end-tag cases are intentionally malformed test fixtures — preserving them as a ParseError (which round-trips the original text) is the documented fallback.

Test plan

  • ./gradlew :rewrite-xml:test — 3 new tests added in XmlParserTest, all rewrite-xml suites green

ANTLR error recovery synthesizes closing-tag tokens when input ends with
an unclosed element, which caused `XmlParserVisitor` to throw
`IndexOutOfBoundsException` from `advanceCursor` and `NullPointerException`
when accessing `Name(1)`. Clamp `advanceCursor` to the source length and
tolerate null `OPEN(1)`/`Name(1)`/`CLOSE(1)` so these inputs fall back to
a `ParseError` (preserving original text) instead of crashing.

See #7554 (comment)
@timtebeek timtebeek merged commit ec4fb3d into main May 5, 2026
1 check passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in OpenRewrite May 5, 2026
@timtebeek timtebeek deleted the tim/xml-parser-fixes branch May 5, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant