Skip to content

Commit b5c3d03

Browse files
authored
Merge pull request #817 from krisk/feat/token-search
feat: token search — per-term fuzzy matching with IDF scoring
2 parents 460eb5b + a9b476b commit b5c3d03

27 files changed

Lines changed: 2358 additions & 129 deletions

README.md

Lines changed: 175 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -7,48 +7,185 @@
77
[![Contributors](https://img.shields.io/github/contributors/krisk/fuse.svg)](https://github.com/krisk/Fuse/graphs/contributors)
88
![License](https://img.shields.io/npm/l/fuse.js.svg)
99

10-
## Supporting Fuse.js
10+
Fuse.js is a lightweight, zero-dependency fuzzy-search library written in TypeScript. It works in the browser and on the server, and is designed for searching small-to-medium datasets on the client side where you can't rely on a dedicated search backend.
11+
12+
## ✨ What's New: Token Search
13+
14+
Multi-word fuzzy search with relevance ranking. Type `"javascrpt paterns"` and find `"JavaScript Patterns"` — typo tolerance, multiple words, and smart ranking all at once.
15+
16+
```js
17+
const fuse = new Fuse(docs, {
18+
useTokenSearch: true,
19+
keys: ['title', 'author', 'description']
20+
})
21+
22+
fuse.search('javascrpt paterns')
23+
// → [{ item: { title: 'JavaScript Patterns', ... } }]
24+
```
25+
26+
See [Token Search](#token-search) below for details.
27+
28+
## Installation
29+
30+
```bash
31+
npm install fuse.js
32+
```
33+
34+
```bash
35+
yarn add fuse.js
36+
```
37+
38+
Or include directly via CDN:
39+
40+
```html
41+
<script src="https://cdn.jsdelivr.net/npm/fuse.js/dist/fuse.min.mjs"></script>
42+
```
43+
44+
## Quick Start
45+
46+
```js
47+
import Fuse from 'fuse.js'
48+
49+
const books = [
50+
{ title: "Old Man's War", author: 'John Scalzi' },
51+
{ title: 'The Lock Artist', author: 'Steve Hamilton' },
52+
{ title: 'HTML5', author: 'Remy Sharp' },
53+
{ title: 'JavaScript: The Good Parts', author: 'Douglas Crockford' }
54+
]
55+
56+
const fuse = new Fuse(books, {
57+
keys: ['title', 'author']
58+
})
59+
60+
fuse.search('javscript')
61+
// → [{ item: { title: 'JavaScript: The Good Parts', ... }, ... }]
62+
```
63+
64+
## Features
65+
66+
### Fuzzy Search
67+
68+
The core of Fuse.js. Uses the [Bitap algorithm](https://en.wikipedia.org/wiki/Bitap_algorithm) for approximate string matching — handles typos, misspellings, and partial matches out of the box.
69+
70+
```js
71+
fuse.search('javscript')
72+
// → [{ item: { title: 'JavaScript: The Good Parts', author: 'Douglas Crockford' } }]
73+
```
74+
75+
### Weighted Keys
76+
77+
Search across multiple fields with different importance levels. Title matches can rank higher than description matches.
78+
79+
```js
80+
const fuse = new Fuse(docs, {
81+
keys: [
82+
{ name: 'title', weight: 2 },
83+
{ name: 'description', weight: 1 }
84+
]
85+
})
86+
```
87+
88+
### Extended Search
89+
90+
Use operators for precise control: exact match (`=`), prefix (`^`), suffix (`!`), and more. Enable with `useExtendedSearch: true`.
91+
92+
```js
93+
const fuse = new Fuse(list, {
94+
useExtendedSearch: true,
95+
keys: ['title']
96+
})
1197

12-
Through contributions, donations, and sponsorship, you allow Fuse.js to thrive. Also, you will be recognized as a beacon of support to open-source developers.
13-
14-
- [Become a backer or sponsor on **GitHub**.](https://github.com/sponsors/krisk)
15-
- [Become a backer or sponsor on **Patreon**.](https://patreon.com/fusejs)
16-
- [One-time donation via **PayPal**.](https://www.paypal.me/kirorisk)
17-
18-
---
19-
20-
<h3 align="center">Sponsors</h3>
21-
<table>
22-
<tbody>
23-
<tr>
24-
<td align="center" valign="middle">
25-
<a href="https://www.worksome.com" target="_blank">
26-
<img width="222px" src="https://raw.githubusercontent.com/krisk/Fuse/7a0d77d85ac90063575613b6a738f418b624357f/docs/.vuepress/public/assets/img/sponsors/worksome.svg">
27-
</a>
28-
</td>
29-
<td align="center" valign="middle">
30-
<a href="https://www.bairesdev.com/sponsoring-open-source-projects/" target="_blank">
31-
<img width="222px" src="https://github.com/krisk/Fuse/blob/gh-pages/assets/img/sponsors/bairesdev.png?raw=true">
32-
</a>
33-
</td>
34-
<td align="center" valign="middle">
35-
<a href="https://litslink.com/" target="_blank">
36-
<img width="222px" src="https://github.com/krisk/Fuse/blob/gh-pages/assets/img/sponsors/litslink.svg?raw=true">
37-
</a>
38-
</td>
39-
</tr>
40-
</body>
41-
</table>
42-
43-
---
44-
45-
## Introduction
46-
47-
Fuse.js is a lightweight fuzzy-search library, written in TypeScript, with zero dependencies.
98+
fuse.search('=exact match') // exact match
99+
fuse.search('^prefix') // starts with
100+
fuse.search('!term') // does not include
101+
```
102+
103+
### Token Search
104+
105+
Splits multi-word queries into individual terms, fuzzy-matches each independently, and ranks results using BM25-style IDF weighting. Enable with `useTokenSearch: true`.
106+
107+
```js
108+
const fuse = new Fuse(docs, {
109+
useTokenSearch: true,
110+
keys: ['title', 'body']
111+
})
112+
113+
fuse.search('express midleware rout')
114+
// Finds "Express Middleware" and "Express Routing Guide" despite typos
115+
```
116+
117+
- **Typo tolerance per word** — each term is fuzzy-matched independently
118+
- **Relevance ranking** — rare terms are weighted higher than common ones
119+
- **Word order independent**`"patterns javascript"` and `"javascript patterns"` return identical results
120+
- **No query length limit** — long multi-word queries work naturally since each term is searched separately
121+
122+
Available in the full build. See [TOKEN_SEARCH.md](TOKEN_SEARCH.md) for details and performance benchmarks.
123+
124+
### Logical Search
125+
126+
Combine conditions with `$and` and `$or` for complex queries. Available in the full build.
127+
128+
```js
129+
fuse.search({
130+
$and: [
131+
{ title: 'javascript' },
132+
{ author: 'crockford' }
133+
]
134+
})
135+
```
136+
137+
### Match Highlighting
138+
139+
Get character-level match indices for highlighting search results in your UI.
140+
141+
```js
142+
const fuse = new Fuse(list, {
143+
includeMatches: true,
144+
keys: ['title']
145+
})
146+
147+
const result = fuse.search('javscript')
148+
// result[0].matches[0].indices → [[0, 9]]
149+
```
150+
151+
### Single String Matching
152+
153+
Use `Fuse.match()` to fuzzy-match a pattern against a single string without creating an index. Useful for one-off comparisons or custom filtering.
154+
155+
```js
156+
const result = Fuse.match('javscript', 'JavaScript: The Good Parts')
157+
// → { isMatch: true, score: 0.04, indices: [[0, 9]] }
158+
```
159+
160+
### Dynamic Collections
161+
162+
Add and remove documents from a live index without rebuilding.
163+
164+
```js
165+
fuse.add({ title: 'New Book', author: 'New Author' })
166+
fuse.remove((doc) => doc.title === 'Old Book')
167+
```
168+
169+
## Builds
170+
171+
Fuse.js ships in two variants:
172+
173+
| Build | Includes | Min + gzip |
174+
|---|---|---|
175+
| **Full** | Fuzzy + Extended + Logical + Token search | ~8 kB |
176+
| **Basic** | Fuzzy search only | ~6.5 kB |
177+
178+
Use the basic build if you only need fuzzy search and want the smallest bundle size.
48179

49180
## Documentation
50181

51-
To check out a [live demo](https://fusejs.io/demo.html) and docs, visit [fusejs.io](https://fusejs.io).
182+
For the full API reference, configuration options, scoring theory, and interactive demos, visit **[fusejs.io](https://fusejs.io)**.
183+
184+
## Supporting Fuse.js
185+
186+
- [Become a backer or sponsor on **GitHub**](https://github.com/sponsors/krisk)
187+
- [Become a backer or sponsor on **Patreon**](https://patreon.com/fusejs)
188+
- [One-time donation via **PayPal**](https://www.paypal.me/kirorisk)
52189

53190
## Develop
54191

TOKEN_SEARCH.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Token Search
2+
3+
Token search splits multi-word queries into individual terms, fuzzy-matches each term independently using the Bitap algorithm, and ranks results using BM25-style IDF weighting. This combines Fuse.js's typo tolerance with relevance ranking — a query like `"javascrpt paterns"` will find `"JavaScript Patterns"`.
4+
5+
## Usage
6+
7+
```js
8+
const fuse = new Fuse(docs, {
9+
useTokenSearch: true,
10+
keys: ['title', 'author', 'description']
11+
})
12+
13+
fuse.search('javascrpt paterns')
14+
// → [{ item: { title: 'JavaScript Patterns', ... }, score: 0.12 }]
15+
```
16+
17+
All existing options work as before: `includeScore`, `includeMatches`, `keys` with weights, `threshold`, `limit`, `shouldSort`, etc.
18+
19+
## How it works
20+
21+
1. **Tokenization** — The query is split into individual words. Each word becomes a separate fuzzy search.
22+
23+
2. **Per-term fuzzy matching** — Each term is matched against each field using the Bitap algorithm with `ignoreLocation: true`, so terms can appear anywhere in the field. This means multi-word queries are no longer limited by the 32-character Bitap pattern cap.
24+
25+
3. **IDF weighting** — An inverted index is built at construction time. Rare terms (appearing in fewer documents) are weighted higher than common terms. This means a match on a distinctive word contributes more to the score than a match on a word that appears everywhere.
26+
27+
4. **Score combination** — Per-term scores are combined additively with IDF weights, then normalized to Fuse's 0–1 range (0 = perfect match).
28+
29+
## Key behaviors
30+
31+
- **Partial matching** — A document matching 2 of 3 query terms is still returned, but ranks lower than one matching all 3.
32+
- **Word order independence**`"patterns javascript"` and `"javascript patterns"` produce identical results.
33+
- **Typo tolerance per term** — Each term is fuzzy-matched independently, so typos in any word are tolerated.
34+
- **Long queries work** — A 6-word query runs 6 independent Bitap searches, each well within the 32-char limit.
35+
36+
## Tips
37+
38+
- Use `threshold` to control fuzziness. The default `0.6` is permissive — for tighter matching, try `0.3` or `0.4`.
39+
- Use `limit` when you only need the top N results. This also improves performance via heap-based selection.
40+
- Key weights still apply on top of token search scoring, so you can boost title matches over body matches.
41+
42+
## Availability
43+
44+
Token search is available in the **full build** (`fuse.js` / `fuse.mjs`). It is not included in the basic build to keep bundle size small. If you use the basic build with `useTokenSearch: true`, an error is thrown.
45+
46+
## Performance
47+
48+
Benchmarked on a corpus of randomly generated documents with 2 keys (title + body):
49+
50+
| Metric | 100 docs | 1,000 docs | 5,000 docs |
51+
|---|---|---|---|
52+
| Index creation overhead | 2.5x | 5.2x | 5.5x |
53+
| Single-term search | 1.8x | 1.8x | 1.7x |
54+
| Multi-term search | 1.3x | 1.3x | 1.2x |
55+
56+
Index creation is a one-time cost (46ms for 5,000 docs). Search overhead is 1.2–1.8x depending on query complexity, primarily because each query term runs its own Bitap search. The inverted index lookup itself is O(1) per term.

bench/token-search.mjs

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
/**
2+
* Benchmark: Token search vs default Bitap search
3+
*
4+
* Measures:
5+
* 1. Index creation time (with and without inverted index)
6+
* 2. Single-term query (both modes — ensures no regression)
7+
* 3. Multi-term query (token search vs Bitap on full string)
8+
* 4. Scaling: 100, 1000, 5000 documents
9+
*/
10+
11+
import Fuse from '../dist/fuse.mjs'
12+
13+
// --- Helpers ---
14+
15+
function time(fn, iterations = 100) {
16+
// Warmup
17+
for (let i = 0; i < 5; i++) fn()
18+
19+
const start = performance.now()
20+
for (let i = 0; i < iterations; i++) fn()
21+
const elapsed = performance.now() - start
22+
return elapsed / iterations
23+
}
24+
25+
function generateDocs(n) {
26+
const words = [
27+
'javascript', 'python', 'rust', 'golang', 'typescript',
28+
'programming', 'algorithms', 'patterns', 'design', 'systems',
29+
'database', 'network', 'security', 'performance', 'testing',
30+
'framework', 'library', 'compiler', 'runtime', 'deployment',
31+
'functional', 'reactive', 'concurrent', 'distributed', 'embedded',
32+
'machine', 'learning', 'neural', 'optimization', 'architecture'
33+
]
34+
35+
const docs = []
36+
for (let i = 0; i < n; i++) {
37+
const titleLen = 2 + Math.floor(Math.random() * 3)
38+
const bodyLen = 8 + Math.floor(Math.random() * 15)
39+
const pick = (len) => Array.from({ length: len }, () => words[Math.floor(Math.random() * words.length)]).join(' ')
40+
docs.push({ title: pick(titleLen), body: pick(bodyLen) })
41+
}
42+
return docs
43+
}
44+
45+
function fmt(ms) {
46+
if (ms < 1) return `${(ms * 1000).toFixed(0)}µs`
47+
return `${ms.toFixed(2)}ms`
48+
}
49+
50+
// --- Benchmarks ---
51+
52+
const sizes = [100, 1000, 5000]
53+
const keys = ['title', 'body']
54+
55+
console.log('Token Search Benchmark')
56+
console.log('='.repeat(60))
57+
58+
for (const n of sizes) {
59+
const docs = generateDocs(n)
60+
61+
console.log(`\n--- ${n} documents ---\n`)
62+
63+
// Index creation
64+
const defaultCreateTime = time(() => new Fuse(docs, { keys }), 20)
65+
const tokenCreateTime = time(() => new Fuse(docs, { keys, useTokenSearch: true }), 20)
66+
console.log(`Index creation (default): ${fmt(defaultCreateTime)}`)
67+
console.log(`Index creation (tokenSearch): ${fmt(tokenCreateTime)}`)
68+
console.log(` overhead: ${((tokenCreateTime / defaultCreateTime - 1) * 100).toFixed(0)}%`)
69+
70+
// Create instances once for search benchmarks
71+
const fuseDefault = new Fuse(docs, { keys })
72+
const fuseToken = new Fuse(docs, { keys, useTokenSearch: true })
73+
74+
// Single-term query
75+
const singleTerm = 'javascript'
76+
const defaultSingleTime = time(() => fuseDefault.search(singleTerm), 200)
77+
const tokenSingleTime = time(() => fuseToken.search(singleTerm), 200)
78+
console.log(`\nSingle-term search (default): ${fmt(defaultSingleTime)}`)
79+
console.log(`Single-term search (token): ${fmt(tokenSingleTime)}`)
80+
console.log(` overhead: ${((tokenSingleTime / defaultSingleTime - 1) * 100).toFixed(0)}%`)
81+
82+
// Multi-term query
83+
const multiTerm = 'javascript design patterns'
84+
const defaultMultiTime = time(() => fuseDefault.search(multiTerm), 200)
85+
const tokenMultiTime = time(() => fuseToken.search(multiTerm), 200)
86+
console.log(`\nMulti-term search (default): ${fmt(defaultMultiTime)}`)
87+
console.log(`Multi-term search (token): ${fmt(tokenMultiTime)}`)
88+
console.log(` ratio: ${(tokenMultiTime / defaultMultiTime).toFixed(2)}x`)
89+
90+
// Multi-term with typos
91+
const typoTerm = 'javascrpt desgn paterns'
92+
const defaultTypoTime = time(() => fuseDefault.search(typoTerm), 200)
93+
const tokenTypoTime = time(() => fuseToken.search(typoTerm), 200)
94+
console.log(`\nTypo query search (default): ${fmt(defaultTypoTime)}`)
95+
console.log(`Typo query search (token): ${fmt(tokenTypoTime)}`)
96+
console.log(` ratio: ${(tokenTypoTime / defaultTypoTime).toFixed(2)}x`)
97+
98+
// Result quality comparison
99+
const fuseDefaultQ = new Fuse(docs, { keys, includeScore: true })
100+
const fuseTokenQ = new Fuse(docs, { keys, useTokenSearch: true, includeScore: true })
101+
const qResults = fuseDefaultQ.search(typoTerm)
102+
const tResults = fuseTokenQ.search(typoTerm)
103+
console.log(`\nResult count (default): ${qResults.length}`)
104+
console.log(`Result count (token): ${tResults.length}`)
105+
}
106+
107+
console.log('\n' + '='.repeat(60))
108+
console.log('Done.')

0 commit comments

Comments
 (0)