feat(data): per-type quotas, rank threshold, NL-tolerant entity FTS#962
Merged
cpcloud merged 1 commit intomicasa-dev:mainfrom Apr 21, 2026
Merged
feat(data): per-type quotas, rank threshold, NL-tolerant entity FTS#962cpcloud merged 1 commit intomicasa-dev:mainfrom
cpcloud merged 1 commit intomicasa-dev:mainfrom
Conversation
This was referenced Apr 20, 2026
563be0f to
ffd7cda
Compare
Replaces the flat LIMIT 20 in SearchEntities with a three-tier
window-function query and adds natural-language query tolerance.
Ranking:
- Tier 1 takes exactly one row per matching entity type (guarantees
cross-type representation).
- Tier 2 raises each type up to ftsEntityKPerType rows so single noisy
types can't dominate.
- Tier 3 fills any remaining room up to ftsEntityTotalCap from whatever's
left, globally ranked. Single-type searches use the full cap this way.
Package-level tuning constants (not user-configurable -- the eval harness
is the tuning channel):
ftsEntityKPerType = 5
ftsEntityRankCeiling = 0.0 // permissive; eval will tighten
ftsEntityTotalCap = 20
entity_id tiebreaks rank in every ORDER BY so results are stable when
BM25 produces identical ranks on similarly-shaped rows.
Query tolerance:
- prepareFTSEntityQuery lowercases, strips non-alphanum, drops short and
stopword tokens, and OR-joins the survivors as quoted prefix phrases.
- Returns early when no content words survive so a pure-stopword question
like "what is it?" doesn't hammer FTS with an empty MATCH.
Tests cover per-type quota preservation under a flood of first-class
matches, single-type searches using the full cap, every matching type
surfacing when 5+ types share a token, total cap enforcement, rank
threshold plumbing, stable ordering across runs, the query builder
directly, and the end-to-end regression that "what's the status of the
kitchen project?" now surfaces the Kitchen Remodel project.
Refs micasa-dev#707.
ffd7cda to
fbd611e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardens
SearchEntitiesagainst real-world natural-language queries and against single-type result floods.Ranking: three-tier window-function query replaces the flat
LIMIT 20:ftsEntityKPerTyperows so single noisy types can't dominate.ftsEntityTotalCapfrom whatever's left, globally ranked. Single-type searches use the full cap this way.Package-level tuning constants (not user-configurable — the eval harness is the tuning channel):
entity_idtiebreaks rank in everyORDER BYso results are stable when BM25 produces identical ranks.Query tolerance:
prepareFTSEntityQuerylowercases, strips non-alphanum, drops short and stopword tokens, and OR-joins the survivors as quoted prefix phrases.Stacked on top of #961 (triggers), which is stacked on #960 (engine).
Refs #707