Skip to content

Latest commit

 

History

History
70 lines (42 loc) · 6.55 KB

File metadata and controls

70 lines (42 loc) · 6.55 KB

Architecture: Why state-trace looks different from Graphiti

This document explains the deliberate architectural choices in state-trace and how they differ from general-purpose temporal context graphs like Graphiti. It exists because a reasonable reviewer comparing the two would otherwise assume state-trace is a toy (in-memory graph, no Neo4j). It isn't; it's a different shape of system for a different problem.

The two problems

Graphiti's problem: unbounded temporal knowledge graph for agents. Many episodes, many entities, long-lived facts that evolve over weeks or months. Multi-tenant. Needs a real graph database because the working set is larger than RAM.

state-trace's problem: bounded working memory for a single coding/debugging session. Tens to low hundreds of nodes at a time. A fix, a failing test, the file under the cursor, the hypothesis the agent is currently exploring. Cold data (closed sessions) stays on disk in SQLite; hot data lives in a networkx.MultiDiGraph and is traversed directly.

These are different systems even when the word "memory" appears in both.

Consequence: the hot graph is in-process

The retrieval path is pure Python over networkx. No ORM, no network round-trip, no query planner. For a working set that fits in RAM (target ceiling: ~256 nodes × ~8 capacity units ≈ ~2k effective memory units per session), this is categorically faster than routing the same traversal through Neo4j or Kuzu:

Operation state-trace (networkx) graph DB (typical)
retrieve() on ~100-node session ~1–30 ms ~50–500 ms
retrieve_brief() (adds compaction) ~2–40 ms ~100–800 ms
2-hop causal traversal with edge prior in-memory BFS/heap Cypher + planner

Those numbers are consistent with the AvgLatencyMs column across the README benchmarks. The graph-DB column is the honest read on "what would this cost if we forced it through Kuzu" and is also consistent with the head-to-head harness latency.

For agent loops that want working-memory retrieval inside every action selection, the difference compounds quickly. For long-lived knowledge bases where you're querying across sessions, weeks of history, and multiple tenants, the graph DB is the right choice — that's Graphiti's lane.

Consequence: capacity is a hard constraint, not an afterthought

Because the hot graph is bounded, state-trace is built around enforce_capacity() from day one: decay, compression, and lifecycle-aware retention are part of the engine, not optional hygiene. The long-horizon pressure benchmark exists to verify this.

Graphiti does not have to make the same tradeoff — it has the substrate to keep growing. But that means "what's still live in this session?" is a question state-trace can answer cheaply and directly (see engine.current_state() and engine.failed_hypotheses()), whereas for Graphiti the same answer has to be inferred from temporal facts.

Consequence: typed coding-agent ontology, not generic Entity/Edge

state-trace ships with ten-plus first-class node types — task, observation, decision, file, symbol, patch_hunk, error_signature, test, command, episode, session, goal — and a dozen-plus causal edge types including patches_file, fails_in, verified_by, rejected_by, supersedes, contradicts, derived_from. The retrieval scorer routes differently per intent (locate_file, failure_analysis, history, general) using those types.

Graphiti is intentionally more schema-light. It's the right tradeoff for a general-purpose system, but it means coding-agent-specific queries ("which file should I patch", "what did I try and reject") go through generic BM25/cosine/BFS instead of type-aware priors.

Cold storage: SQLite+FTS5, not a replacement graph DB

When MemoryEngine(storage_path="...db") points at a .db/.sqlite/.sqlite3 file, state-trace upserts into a WAL-journaled SQLite database with an FTS5 seed index. This is deliberately not a second graph store: SQLite handles durability and cold text seeding; the graph continues to live in networkx for active retrieval.

The tradeoff, stated plainly:

  • JSON backend: simple, single-writer, fine for benchmark scripts. Reloads the whole graph on load().
  • SQLite+FTS5 backend: WAL journal mode, incremental upserts, process-safe reads, FTS5 for seed-stage lexical search. Recommended for long-running MCP harnesses.
  • Neither is a substitute for a real graph DB at SaaS scale. That's intentional: the design brief is local-first working memory.

What would change for multi-tenant SaaS scale

Honest accounting of what would have to change if state-trace wanted to compete with Graphiti on its home turf (multi-tenant, weeks-of-history, cross-session retrieval):

  1. Graph substrate. networkx would go from authoritative to cache. The authoritative store becomes Neo4j/Kuzu/DuckPGQ or similar. The retrieval code is traversal-pattern-compatible with that (it's already heap-priority BFS over typed edges), but the hot-path latency assumptions break and we'd need query planning.
  2. Capacity semantics. The current enforce_capacity is per-process. Multi-tenant needs tenant-scoped budgets, persisted across processes, with eviction audited.
  3. Concurrency. networkx is not thread-safe; the current engine is single-writer. SaaS scale needs optimistic concurrency or a serialized writer queue.
  4. Temporal reasoning. The supersedes and invalid_at primitives work well within a session but haven't been stress-tested across many parallel agents editing the same namespace. Graphiti is ahead here.

None of these are blockers; they're just not work we've done, and they're not the work we should do next. The next move is depth on the coding-agent lane (see README.md benchmark section), not horizontal expansion into Graphiti's lane.

What stays true at scale

The architectural wedge does not depend on the graph substrate:

  • Typed coding-agent nodes and edges.
  • Bounded working memory (a policy, not an implementation detail).
  • current_state / failed_hypotheses as first-class APIs — "what's true now in this debugging session" is cheap for us, expensive for a general-purpose knowledge graph.
  • Compact, artifact-first briefs for small-model harnesses.
  • MCP-mountable, local-first deployment.

A future Neo4j-backed state-trace would still differ from Graphiti in the same ways it does today. It would just be able to hold more sessions concurrently.