feat(data): FTS5 entity search engine#960
Merged
cpcloud merged 1 commit intomicasa-dev:mainfrom Apr 20, 2026
Merged
Conversation
This was referenced Apr 20, 2026
Adds a SQLite FTS5 virtual table `entities_fts` indexing seven entity types (projects, vendors, appliances, incidents, quotes, maintenance items, service logs) and the Go engine to populate and query it. - `entities_fts` virtual table with porter + unicode61 tokenization for stem-folded cross-language matching. - `RebuildFTSIndex` rebuilds the index from scratch from the live entity tables; safe to call repeatedly. - `populateEntitiesFTS` skips soft-deleted rows (including soft-deleted parents when JOINing) so the index never surfaces deleted data. - `SearchEntities` runs a MATCH query, returns BM25-ranked results with a stable `entity_id` tiebreaker, capped at 20 rows. - `EntitySummary` fetches a tri-state result (found / stale / missing) so callers can revalidate cached search hits before using them. - `truncateField` clips indexed content by rune count to keep the FTS shadow table bounded without splitting UTF-8 sequences. Test coverage: creation, rebuild, soft-delete exclusion (own rows and parents), cross-entity search, stemming, graceful degradation on corrupted schema, stale revalidation, Unicode truncation, tiebreaker determinism. This is the index and query layer only. Context-formatting helpers and caller wiring land in follow-up PRs. Refs micasa-dev#707.
cpcloud
added a commit
that referenced
this pull request
Apr 20, 2026
## Summary Installs AFTER INSERT / UPDATE / DELETE triggers on every source table that contributes to `entities_fts` (projects, vendors, appliances, maintenance_items, incidents, service_log_entries, quotes) so the index stays current without `RebuildFTSIndex` on every app open. - Parent tables whose text is embedded in a child's `entity_name` (project.title and vendor.name in quote, maintenance_item.name in service_log) get companion `_au_cascade` triggers that rebuild the child's FTS row when the parent is updated. - Cascade JOINs filter on `parent.deleted_at IS NULL` so a parent soft-delete degrades the child's `entity_name` (project title disappears from the quote; vendor name disappears; SLE name blanks out) instead of leaving stale text in the index. - The populate path carries the same filter so initial rebuilds match the trigger invariant. - Trigger installation is idempotent (DROP IF EXISTS + CREATE), so schema drift heals on the next `Store.Open`. FK constraints (RESTRICT on quote parents, CASCADE on SLE parents) keep the trigger semantics consistent with the rest of the domain. Stacked on top of #960 (FTS engine). Diff will shrink to just the trigger additions once that merges. Refs #707
cpcloud
added a commit
that referenced
this pull request
Apr 21, 2026
…962) ## Summary Hardens `SearchEntities` against real-world natural-language queries and against single-type result floods. **Ranking**: three-tier window-function query replaces the flat `LIMIT 20`: - Tier 1 takes exactly one row per matching entity type (guarantees cross-type representation). - Tier 2 raises each type up to `ftsEntityKPerType` rows so single noisy types can't dominate. - Tier 3 fills the remaining room up to `ftsEntityTotalCap` from whatever's left, globally ranked. Single-type searches use the full cap this way. Package-level tuning constants (not user-configurable — the eval harness is the tuning channel): ftsEntityKPerType = 5 ftsEntityRankCeiling = 0.0 // permissive; eval will tighten ftsEntityTotalCap = 20 `entity_id` tiebreaks rank in every `ORDER BY` so results are stable when BM25 produces identical ranks. **Query tolerance**: - `prepareFTSEntityQuery` lowercases, strips non-alphanum, drops short and stopword tokens, and OR-joins the survivors as quoted prefix phrases. - Returns early when no content words survive so a pure-stopword question like "what is it?" doesn't hammer FTS with an empty MATCH. Stacked on top of #961 (triggers), which is stacked on #960 (engine). Refs #707
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
entities_ftsindexing seven entity types (projects, vendors, appliances, incidents, quotes, maintenance items, service logs) with porter + unicode61 tokenization.RebuildFTSIndexpopulates the index from the live entity tables. Safe to call repeatedly; skips soft-deleted rows (own rows and soft-deleted parents in JOINs).SearchEntitiesruns a MATCH query and returns BM25-ranked results with a stableentity_idtiebreaker, capped at 20 rows.EntitySummaryfetches a tri-state result (found / stale / missing) so callers can revalidate cached search hits.truncateFieldclips indexed content by rune count without splitting UTF-8 sequences.This PR is the index and query layer only. No callers are wired up yet — triggers, query hardening, and the eval subcommand land in follow-up PRs.
Refs #707