|
fn add_token(&mut self, doc_ref: &str, token: &str, term_freq: f64) { |
|
let mut iter = token.chars(); |
|
if let Some(character) = iter.next() { |
During index building, elasticlunr-rs iterates over the token &str's content in Unicode Scalar Values.
While the JS library does it in this way:
elasticlunr.InvertedIndex.prototype.addToken = function (token, tokenInfo, root) {
var root = root || this.root,
idx = 0;
while (idx <= token.length - 1) {
var key = token[idx];
The JS string is actually iterated in UTF-16 Code Units, which are entire characters for English, most alphabetic text, common Chinese characters; but not Emojis and rare Chinese characters.
Related issue with mdBook.
elasticlunr-rs/src/inverted_index.rs
Lines 40 to 42 in 29d97e4
During index building,
elasticlunr-rsiterates over the token&str's content in Unicode Scalar Values.While the JS library does it in this way:
The JS string is actually iterated in UTF-16 Code Units, which are entire characters for English, most alphabetic text, common Chinese characters; but not Emojis and rare Chinese characters.
Related issue with mdBook.