Fix HCL parser for comments with CR-only line endings#6532
Merged
Conversation
The HCL lexer's LINE_COMMENT rule only supported LF (\n) or CRLF (\r\n)
line endings, but not CR-only (\r) line endings (classic Mac-style).
This caused parsing failures for files with CR-only line endings after
comments, such as "# comment\rlocals { a = 1 }".
The fix changes the lexer rule from:
LINE_COMMENT : ('//' | '#') ~[\r\n]* '\r'? ('\n' | EOF) -> channel(HIDDEN);
to:
LINE_COMMENT : ('//' | '#') ~[\r\n]* ('\r\n' | '\r' | '\n' | EOF) -> channel(HIDDEN);
This now accepts all three line ending styles: CRLF, LF, and CR.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add extensive test coverage for HCL comments at the start and end of Terraform/HCL files, including: - Comments as the only content in a file - Comments at the very first character of files - Comments at the end without trailing newlines - Multiple comments at start/end - Block comments at file boundaries - CRLF and CR-only line endings - Unicode and emoji in comments - Realistic Terraform patterns with header/footer comments - Edge cases with heredocs and comments The CR-only line ending tests pass after the lexer grammar fix. One test is disabled pending further investigation: - emojiInComment: Emoji (surrogate pair) in comments at the start of the file causes cursor tracking issues due to a mismatch between ANTLR character indices and the parser visitor's code point tracking. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
mtthwcmpbll
reviewed
Jan 14, 2026
| COMMENT : '/*' .*? '*/' -> channel(HIDDEN); | ||
| LINE_COMMENT : ('//' | '#') ~[\r\n]* '\r'? ('\n' | EOF) -> channel(HIDDEN); | ||
| LINE_COMMENT : ('//' | '#') ~[\r\n]* ('\r\n' | '\r' | '\n' | EOF) -> channel(HIDDEN); | ||
| NEWLINE : '\n' -> channel(HIDDEN); |
There was a problem hiding this comment.
I'm curious why we handle a variety of end of line characters for comments but not broadly as part of the NEWLINE pattern here. The linked doc on HCL syntax doesn't tie handling newlines only to comments (and it refers to newlines as "Newline sequences (either U+000A or U+000D followed by U+000A)" so we shouldn't actually see a solitary \r)
Does it make sense to update the NEWLINE pattern to match the new flexible one ('\r\n' | '\r' | '\n' | EOF) and then use NEWLINE in our LINE_COMMENT pattern?
mtthwcmpbll
approved these changes
Jan 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
\r) line endings in line comments, in addition to LF and CRLFBug Fixed
The HCL lexer's
LINE_COMMENTrule only supported LF (\n) or CRLF (\r\n) line endings. Files with CR-only line endings (classic Mac-style) after comments would fail to parse with errors like:Changes
LINE_COMMENTrule to accept\ras a standalone line endingKnown Issue (Deferred)
One test is disabled (
emojiInComment) - emoji (surrogate pairs) in comments at the start of a file causes cursor tracking issues due to a mismatch between ANTLR character indices and the parser visitor's code point tracking. This requires a deeper refactoring and is tracked for future investigation.Test plan
HclParserTest.unicode) continues to pass🤖 Generated with Claude Code