Skip to content

Commit 8c6183d

Browse files
committed
fix(workers): reject useTokenSearch in FuseWorker
Token search depends on corpus-level statistics (df, fieldCount). Each shard would build those stats from its own slice of the dataset, so ranking would diverge from single-thread Fuse — quietly, with no visible error. Reject useTokenSearch at construction time with a clear, dedicated error and document the limitation alongside the other unsupported options. Sharing global token stats across shards is a bigger change that belongs with the inverted-index work tracked in Plan 008. Plan 007 (post-review HIGH).
1 parent 3959d91 commit 8c6183d

4 files changed

Lines changed: 21 additions & 0 deletions

File tree

docs/web-workers.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,7 @@ The unsupported options are:
175175
- **`sortFn`** — `FuseWorker` always sorts results by Fuse's default `(score, refIndex)` tie-break. If you need a custom sort, run `Fuse` on the main thread or sort the returned array yourself.
176176
- **`getFn`** (top-level) — Fall back to dotted key paths (`'a.b.c'`) or array paths (`['a', 'b', 'c']`).
177177
- **`keys[].getFn`** — Same as above. Use a string or array path on the key.
178+
- **`useTokenSearch`** — Token search depends on corpus-level statistics (`df`, `fieldCount`) that would be computed independently inside each worker shard, producing scores that don't match single-thread `Fuse`. Use `Fuse` directly on the main thread for token search.
178179
179180
Default ordering is preserved: `FuseWorker` returns the same order as `Fuse` for the same inputs (with or without `includeScore`), and `shouldSort: false` returns results in global collection order.
180181

src/core/errorMessages.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,12 @@ export const FUSE_WORKER_UNSUPPORTED_FN_OPTION = (option: string): string =>
2222
`functions cannot be transferred to Web Workers via postMessage. ` +
2323
`Remove this option or fall back to Fuse.`
2424

25+
export const FUSE_WORKER_TOKEN_SEARCH_UNSUPPORTED =
26+
`FuseWorker does not support useTokenSearch: token search depends on ` +
27+
`corpus-level statistics (df, fieldCount) that are computed per shard, ` +
28+
`so per-shard scores would diverge from single-thread Fuse. Use Fuse on ` +
29+
`the main thread for token search.`
30+
2531
export const FUSE_MATCH_TOKEN_SEARCH_UNSUPPORTED =
2632
`Fuse.match does not support useTokenSearch: token search requires ` +
2733
`corpus-level statistics (df, fieldCount) that a one-off string ` +

src/workers/FuseWorker.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,12 @@ export default class FuseWorker<T> {
5757
// Reject function-valued options eagerly. Without this check, postMessage
5858
// throws DataCloneError on first search() rather than at construction.
5959
FuseWorker._assertNoFunctionOptions(this._options)
60+
// Token search needs global corpus statistics, but each shard would build
61+
// its own — scores would diverge from single-thread Fuse. Refuse upfront
62+
// rather than silently returning ordering that doesn't match.
63+
if (this._options.useTokenSearch) {
64+
throw new Error(ErrorMsg.FUSE_WORKER_TOKEN_SEARCH_UNSUPPORTED)
65+
}
6066
// eslint-disable-next-line @typescript-eslint/ban-ts-comment
6167
// @ts-ignore -- import.meta.url is resolved by Rollup at build time
6268
this._workerUrl = this._workerOptions.workerUrl

test/fuse-worker.test.js

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -425,4 +425,12 @@ describe('FuseWorker rejects function-valued options', () => {
425425
})
426426
).not.toThrow()
427427
})
428+
429+
test('throws when useTokenSearch is true', () => {
430+
// Token search needs global corpus stats; per-shard stats would diverge
431+
// from single-thread Fuse, so FuseWorker rejects it upfront.
432+
expect(
433+
() => new FuseWorker(Books, { keys: ['title'], useTokenSearch: true })
434+
).toThrowError(/useTokenSearch/)
435+
})
428436
})

0 commit comments

Comments
 (0)