feat(data): FTS5 entity search engine by cpcloud · Pull Request #960 · micasa-dev/micasa

cpcloud · 2026-04-20T12:18:50Z

Summary

SQLite FTS5 virtual table entities_fts indexing seven entity types (projects, vendors, appliances, incidents, quotes, maintenance items, service logs) with porter + unicode61 tokenization.
RebuildFTSIndex populates the index from the live entity tables. Safe to call repeatedly; skips soft-deleted rows (own rows and soft-deleted parents in JOINs).
SearchEntities runs a MATCH query and returns BM25-ranked results with a stable entity_id tiebreaker, capped at 20 rows.
EntitySummary fetches a tri-state result (found / stale / missing) so callers can revalidate cached search hits.
truncateField clips indexed content by rune count without splitting UTF-8 sequences.

This PR is the index and query layer only. No callers are wired up yet — triggers, query hardening, and the eval subcommand land in follow-up PRs.

Refs #707

Adds a SQLite FTS5 virtual table `entities_fts` indexing seven entity types (projects, vendors, appliances, incidents, quotes, maintenance items, service logs) and the Go engine to populate and query it. - `entities_fts` virtual table with porter + unicode61 tokenization for stem-folded cross-language matching. - `RebuildFTSIndex` rebuilds the index from scratch from the live entity tables; safe to call repeatedly. - `populateEntitiesFTS` skips soft-deleted rows (including soft-deleted parents when JOINing) so the index never surfaces deleted data. - `SearchEntities` runs a MATCH query, returns BM25-ranked results with a stable `entity_id` tiebreaker, capped at 20 rows. - `EntitySummary` fetches a tri-state result (found / stale / missing) so callers can revalidate cached search hits before using them. - `truncateField` clips indexed content by rune count to keep the FTS shadow table bounded without splitting UTF-8 sequences. Test coverage: creation, rebuild, soft-delete exclusion (own rows and parents), cross-entity search, stemming, graceful degradation on corrupted schema, stale revalidation, Unicode truncation, tiebreaker determinism. This is the index and query layer only. Context-formatting helpers and caller wiring land in follow-up PRs. Refs micasa-dev#707.

## Summary Installs AFTER INSERT / UPDATE / DELETE triggers on every source table that contributes to `entities_fts` (projects, vendors, appliances, maintenance_items, incidents, service_log_entries, quotes) so the index stays current without `RebuildFTSIndex` on every app open. - Parent tables whose text is embedded in a child's `entity_name` (project.title and vendor.name in quote, maintenance_item.name in service_log) get companion `_au_cascade` triggers that rebuild the child's FTS row when the parent is updated. - Cascade JOINs filter on `parent.deleted_at IS NULL` so a parent soft-delete degrades the child's `entity_name` (project title disappears from the quote; vendor name disappears; SLE name blanks out) instead of leaving stale text in the index. - The populate path carries the same filter so initial rebuilds match the trigger invariant. - Trigger installation is idempotent (DROP IF EXISTS + CREATE), so schema drift heals on the next `Store.Open`. FK constraints (RESTRICT on quote parents, CASCADE on SLE parents) keep the trigger semantics consistent with the rest of the domain. Stacked on top of #960 (FTS engine). Diff will shrink to just the trigger additions once that merges. Refs #707

…962) ## Summary Hardens `SearchEntities` against real-world natural-language queries and against single-type result floods. **Ranking**: three-tier window-function query replaces the flat `LIMIT 20`: - Tier 1 takes exactly one row per matching entity type (guarantees cross-type representation). - Tier 2 raises each type up to `ftsEntityKPerType` rows so single noisy types can't dominate. - Tier 3 fills the remaining room up to `ftsEntityTotalCap` from whatever's left, globally ranked. Single-type searches use the full cap this way. Package-level tuning constants (not user-configurable — the eval harness is the tuning channel): ftsEntityKPerType = 5 ftsEntityRankCeiling = 0.0 // permissive; eval will tighten ftsEntityTotalCap = 20 `entity_id` tiebreaks rank in every `ORDER BY` so results are stable when BM25 produces identical ranks. **Query tolerance**: - `prepareFTSEntityQuery` lowercases, strips non-alphanum, drops short and stopword tokens, and OR-joins the survivors as quoted prefix phrases. - Returns early when no content words survive so a pure-stopword question like "what is it?" doesn't hammer FTS with an empty MATCH. Stacked on top of #961 (triggers), which is stacked on #960 (engine). Refs #707

cpcloud added enhancement New feature or request data Data layer, models, database labels Apr 20, 2026

cpcloud force-pushed the fts-engine branch from e10a1e6 to 8aac14a Compare April 20, 2026 12:30

cpcloud merged commit 2535cf6 into micasa-dev:main Apr 20, 2026
28 checks passed

cpcloud deleted the fts-engine branch April 20, 2026 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(data): FTS5 entity search engine#960

feat(data): FTS5 entity search engine#960
cpcloud merged 1 commit intomicasa-dev:mainfrom
cpcloud:fts-engine

cpcloud commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cpcloud commented Apr 20, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant