Skip to content

feat(data): per-type quotas, rank threshold, NL-tolerant entity FTS#962

Merged
cpcloud merged 1 commit intomicasa-dev:mainfrom
cpcloud:fts-query-hardening
Apr 21, 2026
Merged

feat(data): per-type quotas, rank threshold, NL-tolerant entity FTS#962
cpcloud merged 1 commit intomicasa-dev:mainfrom
cpcloud:fts-query-hardening

Conversation

@cpcloud
Copy link
Copy Markdown
Collaborator

@cpcloud cpcloud commented Apr 20, 2026

Summary

Hardens SearchEntities against real-world natural-language queries and against single-type result floods.

Ranking: three-tier window-function query replaces the flat LIMIT 20:

  • Tier 1 takes exactly one row per matching entity type (guarantees cross-type representation).
  • Tier 2 raises each type up to ftsEntityKPerType rows so single noisy types can't dominate.
  • Tier 3 fills the remaining room up to ftsEntityTotalCap from whatever's left, globally ranked. Single-type searches use the full cap this way.

Package-level tuning constants (not user-configurable — the eval harness is the tuning channel):

ftsEntityKPerType    = 5
ftsEntityRankCeiling = 0.0   // permissive; eval will tighten
ftsEntityTotalCap    = 20

entity_id tiebreaks rank in every ORDER BY so results are stable when BM25 produces identical ranks.

Query tolerance:

  • prepareFTSEntityQuery lowercases, strips non-alphanum, drops short and stopword tokens, and OR-joins the survivors as quoted prefix phrases.
  • Returns early when no content words survive so a pure-stopword question like "what is it?" doesn't hammer FTS with an empty MATCH.

Stacked on top of #961 (triggers), which is stacked on #960 (engine).

Refs #707

@cpcloud cpcloud added enhancement New feature or request data Data layer, models, database labels Apr 20, 2026
@cpcloud cpcloud force-pushed the fts-query-hardening branch 5 times, most recently from 563be0f to ffd7cda Compare April 21, 2026 14:07
Replaces the flat LIMIT 20 in SearchEntities with a three-tier
window-function query and adds natural-language query tolerance.

Ranking:
- Tier 1 takes exactly one row per matching entity type (guarantees
  cross-type representation).
- Tier 2 raises each type up to ftsEntityKPerType rows so single noisy
  types can't dominate.
- Tier 3 fills any remaining room up to ftsEntityTotalCap from whatever's
  left, globally ranked. Single-type searches use the full cap this way.

Package-level tuning constants (not user-configurable -- the eval harness
is the tuning channel):

    ftsEntityKPerType    = 5
    ftsEntityRankCeiling = 0.0   // permissive; eval will tighten
    ftsEntityTotalCap    = 20

entity_id tiebreaks rank in every ORDER BY so results are stable when
BM25 produces identical ranks on similarly-shaped rows.

Query tolerance:
- prepareFTSEntityQuery lowercases, strips non-alphanum, drops short and
  stopword tokens, and OR-joins the survivors as quoted prefix phrases.
- Returns early when no content words survive so a pure-stopword question
  like "what is it?" doesn't hammer FTS with an empty MATCH.

Tests cover per-type quota preservation under a flood of first-class
matches, single-type searches using the full cap, every matching type
surfacing when 5+ types share a token, total cap enforcement, rank
threshold plumbing, stable ordering across runs, the query builder
directly, and the end-to-end regression that "what's the status of the
kitchen project?" now surfaces the Kitchen Remodel project.

Refs micasa-dev#707.
@cpcloud cpcloud force-pushed the fts-query-hardening branch from ffd7cda to fbd611e Compare April 21, 2026 14:12
@cpcloud cpcloud enabled auto-merge (squash) April 21, 2026 14:16
@cpcloud cpcloud merged commit 3db89fa into micasa-dev:main Apr 21, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Data layer, models, database enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant