feat(data): per-type quotas, rank threshold, NL-tolerant entity FTS by cpcloud · Pull Request #962 · micasa-dev/micasa

cpcloud · 2026-04-20T12:19:31Z

Summary

Hardens SearchEntities against real-world natural-language queries and against single-type result floods.

Ranking: three-tier window-function query replaces the flat LIMIT 20:

Tier 1 takes exactly one row per matching entity type (guarantees cross-type representation).
Tier 2 raises each type up to ftsEntityKPerType rows so single noisy types can't dominate.
Tier 3 fills the remaining room up to ftsEntityTotalCap from whatever's left, globally ranked. Single-type searches use the full cap this way.

Package-level tuning constants (not user-configurable — the eval harness is the tuning channel):

ftsEntityKPerType    = 5
ftsEntityRankCeiling = 0.0   // permissive; eval will tighten
ftsEntityTotalCap    = 20

entity_id tiebreaks rank in every ORDER BY so results are stable when BM25 produces identical ranks.

Query tolerance:

prepareFTSEntityQuery lowercases, strips non-alphanum, drops short and stopword tokens, and OR-joins the survivors as quoted prefix phrases.
Returns early when no content words survive so a pure-stopword question like "what is it?" doesn't hammer FTS with an empty MATCH.

Stacked on top of #961 (triggers), which is stacked on #960 (engine).

Refs #707

Replaces the flat LIMIT 20 in SearchEntities with a three-tier window-function query and adds natural-language query tolerance. Ranking: - Tier 1 takes exactly one row per matching entity type (guarantees cross-type representation). - Tier 2 raises each type up to ftsEntityKPerType rows so single noisy types can't dominate. - Tier 3 fills any remaining room up to ftsEntityTotalCap from whatever's left, globally ranked. Single-type searches use the full cap this way. Package-level tuning constants (not user-configurable -- the eval harness is the tuning channel): ftsEntityKPerType = 5 ftsEntityRankCeiling = 0.0 // permissive; eval will tighten ftsEntityTotalCap = 20 entity_id tiebreaks rank in every ORDER BY so results are stable when BM25 produces identical ranks on similarly-shaped rows. Query tolerance: - prepareFTSEntityQuery lowercases, strips non-alphanum, drops short and stopword tokens, and OR-joins the survivors as quoted prefix phrases. - Returns early when no content words survive so a pure-stopword question like "what is it?" doesn't hammer FTS with an empty MATCH. Tests cover per-type quota preservation under a flood of first-class matches, single-type searches using the full cap, every matching type surfacing when 5+ types share a token, total cap enforcement, rank threshold plumbing, stable ordering across runs, the query builder directly, and the end-to-end regression that "what's the status of the kitchen project?" now surfaces the Kitchen Remodel project. Refs micasa-dev#707.

cpcloud added enhancement New feature or request data Data layer, models, database labels Apr 20, 2026

This was referenced Apr 20, 2026

feat(cli): add micasa eval fts subcommand #963

Merged

feat(data): FTS-powered context enrichment for LLM chat #933

Closed

cpcloud force-pushed the fts-query-hardening branch 5 times, most recently from 563be0f to ffd7cda Compare April 21, 2026 14:07

cpcloud force-pushed the fts-query-hardening branch from ffd7cda to fbd611e Compare April 21, 2026 14:12

cpcloud enabled auto-merge (squash) April 21, 2026 14:16

cpcloud merged commit 3db89fa into micasa-dev:main Apr 21, 2026
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(data): per-type quotas, rank threshold, NL-tolerant entity FTS#962

feat(data): per-type quotas, rank threshold, NL-tolerant entity FTS#962
cpcloud merged 1 commit intomicasa-dev:mainfrom
cpcloud:fts-query-hardening

cpcloud commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cpcloud commented Apr 20, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant