Skip to content

Fix bug where parser misidentifies variables as keywords if the keyword is a prefix of the variable name#1639

Open
OlivierBBB wants to merge 2 commits intomainfrom
1612-fix-return-is-not-allowed-to-be-contained-in-the-name-of-a-variable
Open

Fix bug where parser misidentifies variables as keywords if the keyword is a prefix of the variable name#1639
OlivierBBB wants to merge 2 commits intomainfrom
1612-fix-return-is-not-allowed-to-be-contained-in-the-name-of-a-variable

Conversation

@OlivierBBB
Copy link
Copy Markdown
Collaborator

@OlivierBBB OlivierBBB commented Apr 8, 2026

Initial attempt (adding a whitespace to the keyword pattern) was flawed: it would have misidentified e.g. if(condition == 1). Claude made it robust. The approach: parse all tokens without looking for keywords. In a second pass look for exact matches in the keywords map


Note

Medium Risk
Adjusts core lexer behavior for all ZKC source files; while the change is targeted, any keyword/identifier edge-case could affect parsing across the language.

Overview
Fixes a lexer bug where identifiers could be mis-tokenized as keywords when a keyword appears as a prefix of the identifier.

Keyword matching is removed from the primary lexing rules and replaced with a post-pass in Lex() that reclassifies IDENTIFIER tokens to keyword kinds only on exact text match via a keywords map.

Adds new unit tests (basic_17basic_19) covering keywords at the beginning/end/middle of variable names to prevent regressions.

Reviewed by Cursor Bugbot for commit 4577fbe. Bugbot is set up for automated code reviews on this repo. Configure here.

Approach: parse all tokens without looking for keywords. In a second
pass look for exact matches in the keywords map
Copilot AI review requested due to automatic review settings April 8, 2026 13:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a lexer bug where identifiers were incorrectly tokenized as keywords when a keyword was a prefix of the identifier (e.g., as_X). It does this by lexing keywords as identifiers first, then reclassifying only exact keyword matches in a post-pass.

Changes:

  • Remove explicit keyword lexing rules and introduce a keyword reclassification pass for IDENTIFIER tokens based on exact text match.
  • Add new ZKC unit test sources covering keywords at the beginning/end/middle of identifiers.
  • Register the new unit tests in the Go test suite.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/zkc/compiler/parser/lexer.go Switches keyword recognition to a post-lexing reclassification step to avoid prefix-matching issues.
testdata/zkc/unit/basic_17.zkc Adds coverage for identifiers starting with keywords.
testdata/zkc/unit/basic_18.zkc Adds coverage for identifiers ending with keywords.
testdata/zkc/unit/basic_19.zkc Adds coverage for keywords appearing in the middle of identifiers.
pkg/test/zkc_unit_test.go Adds Go test entries to run the new unit fixtures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +281 to +289
// Reclassify identifiers whose full text is an exact keyword match.
contents := srcfile.Contents()

for i, tok := range tokens {
if tok.Kind == IDENTIFIER {
text := string(contents[tok.Span.Start():tok.Span.End()])
if kind, ok := keywords[text]; ok {
tokens[i].Kind = kind
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reclassifying identifiers currently converts every IDENTIFIER token’s rune slice into a string (string(contents[start:end])). This allocates and UTF-8 encodes even for non-keywords, which can be a noticeable lexer performance regression on large sources. Consider avoiding the conversion unless the span length could match a keyword (e.g., early-continue when span length > max keyword length), or compare against keywords without allocating (e.g., rune-slice comparisons or a small switch).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: return is not allowed to be contained in the name of a variable

2 participants