Q&A: Phase 18.4 — CausalMemoryIndex Design Questions #463
Unanswered
web3guru888
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Phase 18.4 — CausalMemoryIndex: Q&A
Common questions about the CausalMemoryIndex design (spec: #459).
Q1: Why four IndexModes instead of a single generic
query()method?A: Each mode uses a fundamentally different data structure and algorithm:
CAUSE_CHAIN_cause_adjdictEFFECT_FAN_effect_adjdictTEMPORAL_RANGESortedListSALIENCE_TOP_K_entriesvaluesA generic
query(mode, **kwargs)would require runtime dispatch with untyped kwargs — losing type safety and making mypy unhappy. Separate methods give each query a precise signature and enable the LRU cache to key on exact parameter tuples.Q2: How does the salience decay formula work, and why exponential?
A: The formula is:
Exponential decay is standard in memory models (Ebbinghaus forgetting curve). Key properties:
λ(decay_rate) controls the half-lifeWith
λ = 1e-6, half-life ≈ln(2) / 1e-6 ≈ 693,147 seconds ≈ 8 days. This is deliberately slow — important memories should persist for about a week before dropping to 50%.Q3: Why SortedList instead of a B-tree or database index?
A:
sortedcontainers.SortedListis a pure-Python sorted list backed by a list-of-lists structure that gives:irange()A B-tree (e.g., via
blistor SQLite) would add complexity for marginal gain at our expected scale (< 100K entries). If the index grows beyond 1M entries, we'd consider moving the temporal index to a proper B+ tree or an embedded database like LMDB.Q4: How does cache invalidation work when
index_chunkis called?A: On every
index_chunk(entry), the cache invalidation follows this algorithm:affected = {entry.chunk_id} ∪ entry.cause_ids ∪ entry.effect_idschunk_idis inaffectedAdditionally, entries expire after
cache_ttl_s(default 60s) — so even without explicit invalidation, stale results are bounded.The cache is intentionally simple (OrderedDict with TTL) rather than a full tag-based invalidation system. At typical cache sizes (< 100 entries due to TTL), the O(n) scan on invalidation takes < 1ms.
Q5: How does CausalMemoryIndex integrate with CausalGraph (8.2)?
A: CausalMemoryIndex does not import or depend on CausalGraph at runtime for queries. Instead:
At rebuild time:
rebuild()calls an injected callback (or Protocol method) that fetches all edges from CausalGraph. These edges are projected into the local_cause_adjand_effect_adjdicts, filtered to only include chunk_ids present in_entries.At index time: When
index_chunk()is called, thecause_idsandeffect_idsin theIndexEntryare assumed to have been resolved from CausalGraph by the caller (typically MemoryConsolidator or a pipeline stage).This loose coupling avoids import cycles and allows testing CausalMemoryIndex with mock data without standing up a full CausalGraph instance.
Q6: How should the rebuild interval be tuned?
A: The default
rebuild_interval_s = 300(5 minutes) balances freshness vs CPU cost:index_chunkfor freshness, rebuild as safety netMonitor
asi_causal_index_rebuild_duration_seconds— if rebuilds take > 50% of the interval, increase the interval or optimise the upstream data pull.Q7: What is the
embeddingfield for, and when will it be used?A: The
embedding: tuple[float, ...] | Nonefield onIndexEntryis a forward-looking extension point for semantic similarity queries.Current plan:
None. No query mode uses it.SEMANTIC_KNNIndexMode could be added that uses approximate nearest-neighbour search (e.g., HNSW viahnswliborfaiss) over the embedding vectors.The field is included now so that:
IndexEntryschema doesn't need a breaking change laterSince
embeddingdefaults toNoneand is a tuple (immutable, hashable), it doesn't affect memory usage or performance for entries that don't use it.Beta Was this translation helpful? Give feedback.
All reactions