Skip to content

PROPOSAL: Explicit Keywords DSL #2263

@jneen

Description

@jneen

Rationale

Implementing keywords properly has proven to be a bit confusing over the years, and is the primary reason that lexers take so long to merge. While cleaning up a large chunk of poorly implemented keyword detection, I realized that instead of expecting contributors to follow a specific pattern, it might be better to just reify that pattern in the DSL.

Proposal

A keywords method alongside rule and mixin, which explicitly maps keyword sets to token types, given a covering regex:

# The covering regex (/\w+/ here) should match the general "form" of keywords.
# It should match every keyword and also stop at the appropriate place.
# This is possible in the vast majority of languages - the only exception I am aware of is Gherkin.
keywords %r/\w+/ do
  # optionally transform the match before checking
  transform(&:downcase)

  # map sets to token types. Syntax is similar to #rule
  rule KEYWORDS, Keyword
  rule BUILTINS, Name::Builtin, :some_state
  rule OPERATORS do |m|
    token Operator
    pop! if m == "end"
  end

  # can also re-match if needed
  rule %r/.../, Some::Token

  # optional default action. if not given, we fall through in the default case.
  default Name
end

I have a prototype implementation, and am experimenting with how this affects existing lexers. I will have to run benchmarks as well. The DSL block is only ever eval'd once, creating a single object which then handles the match-time behaviour.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions