Skip to content

Fix HCL parser for comments with CR-only line endings#6532

Merged
timtebeek merged 6 commits intomainfrom
fix/hcl-comment-parsing-edge-cases
Jan 14, 2026
Merged

Fix HCL parser for comments with CR-only line endings#6532
timtebeek merged 6 commits intomainfrom
fix/hcl-comment-parsing-edge-cases

Conversation

@timtebeek
Copy link
Copy Markdown
Member

Summary

  • Fix HCL lexer to support CR-only (\r) line endings in line comments, in addition to LF and CRLF
  • Add comprehensive test coverage for HCL comments at file boundaries (60+ new tests)

Bug Fixed

The HCL lexer's LINE_COMMENT rule only supported LF (\n) or CRLF (\r\n) line endings. Files with CR-only line endings (classic Mac-style) after comments would fail to parse with errors like:

token recognition error at: '# comment\rlocals'

Changes

  • HCLLexer.g4: Updated LINE_COMMENT rule to accept \r as a standalone line ending
  • HclCommentTest.java: Added comprehensive tests for:
    • Comments as the only file content
    • Comments at file start/end (with and without trailing newlines)
    • CR-only, LF, and CRLF line endings
    • Block comments at file boundaries
    • Unicode in comments
    • Realistic Terraform patterns with header/footer comments

Known Issue (Deferred)

One test is disabled (emojiInComment) - emoji (surrogate pairs) in comments at the start of a file causes cursor tracking issues due to a mismatch between ANTLR character indices and the parser visitor's code point tracking. This requires a deeper refactoring and is tracked for future investigation.

Test plan

  • All existing HCL tests pass
  • New CR-only line ending tests pass
  • New comment boundary tests pass
  • Unicode test (HclParserTest.unicode) continues to pass

🤖 Generated with Claude Code

timtebeek and others added 2 commits January 14, 2026 14:44
The HCL lexer's LINE_COMMENT rule only supported LF (\n) or CRLF (\r\n)
line endings, but not CR-only (\r) line endings (classic Mac-style).

This caused parsing failures for files with CR-only line endings after
comments, such as "# comment\rlocals { a = 1 }".

The fix changes the lexer rule from:
  LINE_COMMENT : ('//' | '#') ~[\r\n]* '\r'? ('\n' | EOF) -> channel(HIDDEN);
to:
  LINE_COMMENT : ('//' | '#') ~[\r\n]* ('\r\n' | '\r' | '\n' | EOF) -> channel(HIDDEN);

This now accepts all three line ending styles: CRLF, LF, and CR.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add extensive test coverage for HCL comments at the start and end of
Terraform/HCL files, including:

- Comments as the only content in a file
- Comments at the very first character of files
- Comments at the end without trailing newlines
- Multiple comments at start/end
- Block comments at file boundaries
- CRLF and CR-only line endings
- Unicode and emoji in comments
- Realistic Terraform patterns with header/footer comments
- Edge cases with heredocs and comments

The CR-only line ending tests pass after the lexer grammar fix.

One test is disabled pending further investigation:
- emojiInComment: Emoji (surrogate pair) in comments at the start of
  the file causes cursor tracking issues due to a mismatch between
  ANTLR character indices and the parser visitor's code point tracking.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@timtebeek timtebeek requested a review from mtthwcmpbll January 14, 2026 14:09
@timtebeek timtebeek added enhancement New feature or request hcl parser labels Jan 14, 2026
Comment thread rewrite-hcl/src/main/antlr/HCLLexer.g4 Outdated
COMMENT : '/*' .*? '*/' -> channel(HIDDEN);
LINE_COMMENT : ('//' | '#') ~[\r\n]* '\r'? ('\n' | EOF) -> channel(HIDDEN);
LINE_COMMENT : ('//' | '#') ~[\r\n]* ('\r\n' | '\r' | '\n' | EOF) -> channel(HIDDEN);
NEWLINE : '\n' -> channel(HIDDEN);
Copy link
Copy Markdown

@mtthwcmpbll mtthwcmpbll Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why we handle a variety of end of line characters for comments but not broadly as part of the NEWLINE pattern here. The linked doc on HCL syntax doesn't tie handling newlines only to comments (and it refers to newlines as "Newline sequences (either U+000A or U+000D followed by U+000A)" so we shouldn't actually see a solitary \r)

Does it make sense to update the NEWLINE pattern to match the new flexible one ('\r\n' | '\r' | '\n' | EOF) and then use NEWLINE in our LINE_COMMENT pattern?

@github-project-automation github-project-automation Bot moved this from In Progress to Ready to Review in OpenRewrite Jan 14, 2026
@timtebeek timtebeek merged commit a36416d into main Jan 14, 2026
2 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Review to Done in OpenRewrite Jan 14, 2026
@timtebeek timtebeek deleted the fix/hcl-comment-parsing-edge-cases branch January 14, 2026 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request hcl parser

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants