This document explains the deliberate architectural choices in state-trace and how they differ from general-purpose temporal context graphs like Graphiti. It exists because a reasonable reviewer comparing the two would otherwise assume state-trace is a toy (in-memory graph, no Neo4j). It isn't; it's a different shape of system for a different problem.
Graphiti's problem: unbounded temporal knowledge graph for agents. Many episodes, many entities, long-lived facts that evolve over weeks or months. Multi-tenant. Needs a real graph database because the working set is larger than RAM.
state-trace's problem: bounded working memory for a single coding/debugging session. Tens to low hundreds of nodes at a time. A fix, a failing test, the file under the cursor, the hypothesis the agent is currently exploring. Cold data (closed sessions) stays on disk in SQLite; hot data lives in a networkx.MultiDiGraph and is traversed directly.
These are different systems even when the word "memory" appears in both.
The retrieval path is pure Python over networkx. No ORM, no network round-trip, no query planner. For a working set that fits in RAM (target ceiling: ~256 nodes × ~8 capacity units ≈ ~2k effective memory units per session), this is categorically faster than routing the same traversal through Neo4j or Kuzu:
| Operation | state-trace (networkx) | graph DB (typical) |
|---|---|---|
retrieve() on ~100-node session |
~1–30 ms | ~50–500 ms |
retrieve_brief() (adds compaction) |
~2–40 ms | ~100–800 ms |
| 2-hop causal traversal with edge prior | in-memory BFS/heap | Cypher + planner |
Those numbers are consistent with the AvgLatencyMs column across the README benchmarks. The graph-DB column is the honest read on "what would this cost if we forced it through Kuzu" and is also consistent with the head-to-head harness latency.
For agent loops that want working-memory retrieval inside every action selection, the difference compounds quickly. For long-lived knowledge bases where you're querying across sessions, weeks of history, and multiple tenants, the graph DB is the right choice — that's Graphiti's lane.
Because the hot graph is bounded, state-trace is built around enforce_capacity() from day one: decay, compression, and lifecycle-aware retention are part of the engine, not optional hygiene. The long-horizon pressure benchmark exists to verify this.
Graphiti does not have to make the same tradeoff — it has the substrate to keep growing. But that means "what's still live in this session?" is a question state-trace can answer cheaply and directly (see engine.current_state() and engine.failed_hypotheses()), whereas for Graphiti the same answer has to be inferred from temporal facts.
state-trace ships with ten-plus first-class node types — task, observation, decision, file, symbol, patch_hunk, error_signature, test, command, episode, session, goal — and a dozen-plus causal edge types including patches_file, fails_in, verified_by, rejected_by, supersedes, contradicts, derived_from. The retrieval scorer routes differently per intent (locate_file, failure_analysis, history, general) using those types.
Graphiti is intentionally more schema-light. It's the right tradeoff for a general-purpose system, but it means coding-agent-specific queries ("which file should I patch", "what did I try and reject") go through generic BM25/cosine/BFS instead of type-aware priors.
When MemoryEngine(storage_path="...db") points at a .db/.sqlite/.sqlite3 file, state-trace upserts into a WAL-journaled SQLite database with an FTS5 seed index. This is deliberately not a second graph store: SQLite handles durability and cold text seeding; the graph continues to live in networkx for active retrieval.
The tradeoff, stated plainly:
- JSON backend: simple, single-writer, fine for benchmark scripts. Reloads the whole graph on
load(). - SQLite+FTS5 backend: WAL journal mode, incremental upserts, process-safe reads, FTS5 for seed-stage lexical search. Recommended for long-running MCP harnesses.
- Neither is a substitute for a real graph DB at SaaS scale. That's intentional: the design brief is local-first working memory.
Honest accounting of what would have to change if state-trace wanted to compete with Graphiti on its home turf (multi-tenant, weeks-of-history, cross-session retrieval):
- Graph substrate. networkx would go from authoritative to cache. The authoritative store becomes Neo4j/Kuzu/DuckPGQ or similar. The retrieval code is traversal-pattern-compatible with that (it's already heap-priority BFS over typed edges), but the hot-path latency assumptions break and we'd need query planning.
- Capacity semantics. The current
enforce_capacityis per-process. Multi-tenant needs tenant-scoped budgets, persisted across processes, with eviction audited. - Concurrency. networkx is not thread-safe; the current engine is single-writer. SaaS scale needs optimistic concurrency or a serialized writer queue.
- Temporal reasoning. The
supersedesandinvalid_atprimitives work well within a session but haven't been stress-tested across many parallel agents editing the same namespace. Graphiti is ahead here.
None of these are blockers; they're just not work we've done, and they're not the work we should do next. The next move is depth on the coding-agent lane (see README.md benchmark section), not horizontal expansion into Graphiti's lane.
The architectural wedge does not depend on the graph substrate:
- Typed coding-agent nodes and edges.
- Bounded working memory (a policy, not an implementation detail).
current_state/failed_hypothesesas first-class APIs — "what's true now in this debugging session" is cheap for us, expensive for a general-purpose knowledge graph.- Compact, artifact-first briefs for small-model harnesses.
- MCP-mountable, local-first deployment.
A future Neo4j-backed state-trace would still differ from Graphiti in the same ways it does today. It would just be able to hold more sessions concurrently.