Skip to content
This repository was archived by the owner on Aug 15, 2025. It is now read-only.

Searching the documentation with filters#131

Open
polibb wants to merge 37 commits intoleanprover-community:masterfrom
polibb:master
Open

Searching the documentation with filters#131
polibb wants to merge 37 commits intoleanprover-community:masterfrom
polibb:master

Conversation

@polibb
Copy link
Copy Markdown
Collaborator

@polibb polibb commented May 14, 2021

Searching the mathlib library contents in the documentation and filtering of the resulting set of declarations.
One is now able to:

  • search through the declaration names, the module names and the description texts that belong to any declarations of theorems, axioms, definitions, etc.
  • filter the search results before or after submitting the search query by choosing values for attributes and kind of the resulting declarations
  • selected attributes are combined with an OR. If an attribute in the list of attributes for any given declaration is found in the list submitted in the filters, then the declaration is legitimate.
  • selected kind options are combined with an OR.
  • if both any attributes are chosen and any kind options are selected, then the two sets of OR-connected values are combined with an AND. Hence, if a declaration fits the selection of one, but does not fit the possibilities allowed by the other filter, the declaration is considered illegitimate.

More information:

  1. Results are shown in prioritized sequence from most matches per declaration contents to least.
  2. Highest priority is given to the declaration name. Following is its descriptive text added as a comment in the mathlib library itself. Lastly, the module name the declaration is found in (file name) carries the least priority.
  3. Submitting the filters form at any point also re-submits the search query, if there is any.
  4. Maximum of 10 results is shown, but an option to expand and review all results is present in the UI.
  5. While the user is on any given page and searches for a term there for the first time the results will take awhile to load, but any subsequent search takes no extra time whatsoever. Thus, we recommend loading a search result once on any page and designating it for further searches, if needed, because it will save you time.
  6. Navigation with the keyboard is not possible at this point, please expect it in the near future.

polibb and others added 30 commits November 10, 2020 09:41
This way pylint will notice if you accidentally use the wrong key.
I'm not sure if this is officially "pythonic".

It seems like more could be done translating the json to actual data structures with named fields; then these lookup tables wouldn't be necessary!
But this might be a lot of work for little gain.
use class instead of dict for structure field names
@polibb
Copy link
Copy Markdown
Collaborator Author

polibb commented May 15, 2021

UPDATE: Fixes pushed, please pull to see them.

  • handled console errors for missing elements and trying to access methods on them
  • loading added when going from basic (max 10) results set to "Show all" when rendering the results on the DOM itself takes too long and might crash the site is abused

The script that has been used for almost a year and is running in the background for the search to function in this PR too is being ran by a SharedWorker which is not compatible with most browsers on a phone and Safari and IE on a desktop. This is not ideal, I understand, but will be improved in the future as it is not part of the scope at this time for this PR.

@bryangingechen
Copy link
Copy Markdown
Collaborator

#deploy

@github-actions
Copy link
Copy Markdown

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

Copy link
Copy Markdown
Contributor

@RaitoBezarius RaitoBezarius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's super cool :)

Comment thread mathlib_data_structures.py Outdated
@@ -0,0 +1,53 @@
class mathlibStructures:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, that would make sense to make those inherit from enum.Enum also?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try using Enum at first, yes, but there was some issue - not sure what it was since I haven't focused on that in a while. Actually none of the classes here are used at the moment, this is going to be included in another PR which focuses on organizational structure of the python script itself, so that it's easier to read and update. I should probably remove this file from this PR and will look into using Enums again for the next one where those will be needed.

Comment thread searchWorker.js
}
}

return hasKindFilter && hasAttrFilter ?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to split these 3 ternary expressions into 3-ifs or early return?

This comment was marked as off-topic.

@gebner
Copy link
Copy Markdown
Member

gebner commented May 17, 2021

#deploy

@github-actions
Copy link
Copy Markdown

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

@bryangingechen
Copy link
Copy Markdown
Collaborator

#deploy

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jun 8, 2021

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

Comment thread searchWorker.js Outdated
// Adapted from the default tokenizer and
// https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BZ%7D&abb=on&c=on&esc=on&g=&i=
const SEPARATOR = /[._\n\r \u00A0\u1680\u2000-\u200A\u2028\u2029\u202F\u205F\u3000]+/u
req.open('GET', 'searchable_data.json', false /* blocking */);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is 7-8x larger than the decl.bmp we currently load- I worry this will substantially increase load time for every search.

@bryangingechen
Copy link
Copy Markdown
Collaborator

One other thing that would be great to have before this goes live is an easily accessible / easy-to-find page describing all the search features. Maybe an extra section on the mostly-blank index page https://leanprover-community.github.io/mathlib_docs_demo/ will do?

@robertylewis
Copy link
Copy Markdown
Member

I think the only blocking thing here is the size of the json file as Eric points out. Was the reason for using a bmp before compression or caching? If we zip the json file when we generate it and unzip locally, I guess browsers won't cache the unzipped version, right?

For @bryangingechen 's suggestion, I think that's a good place to describe the search features and we could probably use some text from this PR.

@gebner
Copy link
Copy Markdown
Member

gebner commented Jun 29, 2021

The reason for the .bmp extension instead of .json is that github will then automatically gzip it. See #125

@gebner
Copy link
Copy Markdown
Member

gebner commented Jun 29, 2021

#deploy

@github-actions
Copy link
Copy Markdown

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

@eric-wieser
Copy link
Copy Markdown
Member

The bmp is now 1660854 bytes, and loads in half a second on my machine - I think it was 15mB before compression before, so the file extension change definitely helps.

However, the call to miniSearch.addAll(indexedData) takes 28 seconds for me, vs 6 seconds it takes on the live site.

@robertylewis
Copy link
Copy Markdown
Member

I've created polibb#4 to briefly document the search.

I agree with Eric that there's still a slowdown, but it's not near 28 seconds for me. To me the new performance is acceptable, and I'm okay merging, what do others think?

@robertylewis
Copy link
Copy Markdown
Member

#deploy

@github-actions
Copy link
Copy Markdown

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

@robertylewis
Copy link
Copy Markdown
Member

Hmm, the 404 page needs to be updated as well:
https://leanprover-community.github.io/mathlib_docs_demo/find/nat.addo

@robertylewis
Copy link
Copy Markdown
Member

#deploy

@github-actions
Copy link
Copy Markdown

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

@gebner
Copy link
Copy Markdown
Member

gebner commented Jan 28, 2022

#deploy

@github-actions
Copy link
Copy Markdown

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants