Rationale
Implementing keywords properly has proven to be a bit confusing over the years, and is the primary reason that lexers take so long to merge. While cleaning up a large chunk of poorly implemented keyword detection, I realized that instead of expecting contributors to follow a specific pattern, it might be better to just reify that pattern in the DSL.
Proposal
A keywords method alongside rule and mixin, which explicitly maps keyword sets to token types, given a covering regex:
# The covering regex (/\w+/ here) should match the general "form" of keywords.
# It should match every keyword and also stop at the appropriate place.
# This is possible in the vast majority of languages - the only exception I am aware of is Gherkin.
keywords %r/\w+/ do
# optionally transform the match before checking
transform(&:downcase)
# map sets to token types. Syntax is similar to #rule
rule KEYWORDS, Keyword
rule BUILTINS, Name::Builtin, :some_state
rule OPERATORS do |m|
token Operator
pop! if m == "end"
end
# can also re-match if needed
rule %r/.../, Some::Token
# optional default action. if not given, we fall through in the default case.
default Name
end
I have a prototype implementation, and am experimenting with how this affects existing lexers. I will have to run benchmarks as well. The DSL block is only ever eval'd once, creating a single object which then handles the match-time behaviour.
Rationale
Implementing keywords properly has proven to be a bit confusing over the years, and is the primary reason that lexers take so long to merge. While cleaning up a large chunk of poorly implemented keyword detection, I realized that instead of expecting contributors to follow a specific pattern, it might be better to just reify that pattern in the DSL.
Proposal
A
keywordsmethod alongsideruleandmixin, which explicitly maps keyword sets to token types, given a covering regex:I have a prototype implementation, and am experimenting with how this affects existing lexers. I will have to run benchmarks as well. The DSL block is only ever eval'd once, creating a single object which then handles the match-time behaviour.