implement lenient parser by trinity-1686a · Pull Request #2129 · quickwit-oss/tantivy

trinity-1686a · 2023-07-19T09:09:00Z

fix #5

Most of the grammar part is done, however it's not reporting hints just yet (this isn't really required, but would be a nice addition, to help someone understand how their query was understood, and why the parser think it's somewhat wrong).

At the moment, only the part in query-grammar is done. More work is required to convert the AST to a query without failing. This part hasn't been explored yet, but should be before the ticket is considered fixed.

I'm sending this as a draft to get initial comments on the parser part

some tests are still failing

codecov-commenter · 2023-07-19T10:17:46Z

Codecov Report

Merging #2129 (f7ed0b7) into main (ebc7812) will decrease coverage by 0.03%.
Report is 32 commits behind head on main.
The diff coverage is 93.66%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main    #2129      +/-   ##
==========================================
- Coverage   94.38%   94.35%   -0.03%     
==========================================
  Files         321      320       -1     
  Lines       60583    61527     +944     
==========================================
+ Hits        57179    58053     +874     
- Misses       3404     3474      +70

Files Changed	Coverage Δ
src/schema/json_object_options.rs	`92.66% <ø> (ø)`
src/schema/mod.rs	`100.00% <ø> (ø)`
query-grammar/src/lib.rs	`57.14% <25.00%> (-42.86%)`	⬇️
src/query/query_parser/query_parser.rs	`92.82% <66.94%> (-2.61%)`	⬇️
query-grammar/src/infallible.rs	`92.99% <92.99%> (ø)`
query-grammar/src/user_input_ast.rs	`98.42% <94.73%> (+1.12%)`	⬆️
query-grammar/src/query_grammar.rs	`97.81% <97.33%> (-1.92%)`	⬇️

... and 51 files with indirect coverage changes

adamreichold · 2023-07-20T13:17:39Z

The main question that came to mind reading this was whether the lenient parser should fully replace the existing parser. I think as an API between systems, the existing - let's say "strict" - parser is still useful. Of course, as a user interface, a lenient parser is very much preferable.

How do people think about having both parsers and choosing between them at compile time or via a runtime flag? I understand that maintaining two parsers is a significant commitment but I also feel that replacing the existing behaviour in a "flag day" type of change could trigger a certain amount of user backlash.

trinity-1686a · 2023-07-20T13:21:42Z

the goal isn't to remove the strict parser. It was rewritten so both could share some code, but it will definitely be kept for the foreseeable future.

adamreichold · 2023-07-20T15:27:32Z

the goal isn't to remove the strict parser. It was rewritten so both could share some code, but it will definitely be kept for the foreseeable future.

Ah sorry, then I misunderstood the PR in my first reading and the concern does not apply at all. Thanks for clarifying!

fulmicoton

Nice approach.

But you really should put more effort in making your code readable.
Here are a bunch of concrete things you can do to improve this.

Have proper variable names

This is short and well contained, so i and o might be ok variable names.
I would have preferred input and output but I guess this can be argued.
i1, i2 however, is really not helpful.

Stop abusing iterator style.

For instance, if you do it to mutate stuff, that's probably a bad idea.
If you start using rare iterator functions, that's probably a bad idea too.
If you end up struggling with dealing with error handling, using more than one trick like transpose and collect::<Result<T,_>>, it is probably a bad idea.
If your code seems to suggest a variable is a vec but it is in fact an Option or a Result, this is a bad idea.

Consider using struct instead of tuple

type Error = (usize, String);
That kind of tuple makes code easy to write, but hard to understand.
You actually changed the semantics of the first argument, which is making things even worse.

If possible, I would have preferred two structs.
This however comes with a verbosity cost, so this may not be pertinent: I leave you judge of whether the change should be done or not.

Comment more

Something like
type Error = (usize, String);
is really hard for the reader.

Show types in your code when it is not easy to guess.

We sometimes have to read rust without type hints.
For instance in github or in an editor for which rust analyzer has not finished crunching a
freshly checked out branch.
It is useful to add explicit types here in strategic places.

trinity-1686a added 5 commits June 29, 2023 17:49

move query parser to nom

42e35c5

add suupport for term grouping

bac9236

initial work on infallible parser

3fdcc04

some tests are still failing

fmt

a50f6e6

add tests and fix minor parsing bugs

f0fb438