feat: add Cursor agent transcript JSONL normalizer#232
feat: add Cursor agent transcript JSONL normalizer#232marerem wants to merge 2 commits intoMemPalace:mainfrom
Conversation
Add _try_cursor_jsonl() parser for Cursor IDE agent transcript files
(~/.cursor/projects/<proj>/agent-transcripts/<uuid>.jsonl).
Each JSONL line follows {"role": "user"|"assistant", "message":
{"content": [{"type": "text", "text": "..."}]}}. The parser is
discriminated from Claude Code JSONL (top-level "type" key) and
Codex JSONL ("session_meta"/"event_msg" wrappers) via a
has_cursor_structure guard that requires list-typed content blocks.
Includes 8 test cases covering multi-turn conversations, empty
content handling, malformed line tolerance, and false-positive
rejection for other JSONL formats.
Closes MemPalace#59 (Cursor portion)
Made-with: Cursor
PR Review: feat: add Cursor agent transcript JSONL normalizerExecutive Summary
Affected Areas: Business Impact: Enables mining Cursor IDE conversations into the palace, expanding the supported conversation sources. Addresses part of #59. Flow Changes: New parser slot in Ratings
PR Health
High Priority IssuesNone. Medium Priority IssuesNone. Low Priority Issues🐛 #1:
|
| Format | Key used | Cursor parser sees | Result |
|---|---|---|---|
| Claude Code JSONL | type: "human" |
role = "" → skip all |
✅ No match |
| Codex JSONL | event_msg / session_meta |
role = "" → skip all |
✅ No match |
| Cursor JSONL | role: "user" + list content |
Matches correctly | ✅ Match |
| Claude Code → Cursor? | Has type, not role |
All entries skipped | ✅ Safe |
| Cursor → Claude Code? | Has role, not type |
All entries skipped | ✅ Safe |
No cross-contamination risk between any parser pair.
Strengths
- Clean discrimination: The
rolevstypedistinction is simple and bulletproof has_cursor_structureflag: Smart heuristic — requires list-typed content to prevent matching genericrole-based JSONL- Helper reuse: Leverages existing
_extract_content()and_messages_to_transcript()— zero duplication of output logic - Excellent test coverage: 9 tests covering happy path, multi-turn, empty content, single message rejection, format discrimination, multiple content blocks, malformed lines, and string-vs-list content
- Consistent style: Follows the exact same pattern as
_try_claude_code_jsonland_try_codex_jsonl
Created by Octocode MCP https://octocode.ai 🔍🐙
- Set has_cursor_structure only after non-empty text from list-typed content - Add test for role entries without message; unit test that Claude Code JSONL does not match _try_cursor_jsonl; relax integration assertion on format Made-with: Cursor
web3guru888
left a comment
There was a problem hiding this comment.
Clean implementation. A few observations from working with this format in production:
Detection logic is solid. The dual guard ( in + ) correctly discriminates against Claude Code JSONL (top-level ) and Codex JSONL ( wrapper). The has_cursor_structure flag requiring at least one list-typed content block is a good extra check before committing.
The SQLite investigation note is useful context. The explanation of why is incomplete (no paired turns in , only prompts without responses) saves future contributors from going down that path again. Consider keeping a brief note in the docstring or a comment.
Missing edge case: tool/image content blocks. Cursor agent transcripts can include non-text content blocks (, ). The current will silently skip them (correct behavior), but it's worth a test confirming that a turn containing only non-text blocks doesn't produce a phantom message. Something like:
Test coverage is thorough — malformed lines, single-message files, wrong schema, multi-block content. The integration check is a nice touch.
One nit: the import in the test file suggests it's part of the public API. If that's intentional (for testing), it's fine — just worth noting that it's now a surface others might rely on.
Overall this is a clean, focused addition that fills a real gap for teams using Cursor. The heuristic is conservative enough to avoid false positives.
|
(Addendum to my review above — the shell ate the backtick formatting, sorry for the noise. The cleaned-up version:) Detection logic is solid. The dual guard ( Missing edge case worth testing: tool/image content blocks. Cursor transcripts can include non-text blocks ( The SQLite investigation note is useful context. The explanation of why Test coverage otherwise looks thorough — malformed lines, single-message files, wrong schema, multi-block content all covered. The |
|
Conflicts with main. Cursor JSONL normalization is being addressed in #287. Thanks. |
Draft plugin specification for source adapters, mirroring RFC 001's role for storage backends. Formalizes the contract six community ingester PRs (#274, #23, #169, #232, #567, #98, #702) plus #981's metadata-only mode have been reinventing ad-hoc, so adapter authors can build to a stable surface. Key decisions: - Single ingest() method; lazy adapters yield SourceItemMetadata ahead of drawers, eager adapters interleave - Declared-transformation model (§1.4) replaces informal verbatim promise with a verifiable one; byte_preserving adapters declare the empty set, declared_lossy adapters enumerate. Existing miner.py and the convo_miner+normalize pipeline map cleanly - Palace is the incremental cursor via is_current(item, metadata); no sidecar persistence - Routing is adapter-owned; detect_room/detect_hall move into the filesystem adapter - Flat metadata per ChromaDB (RFC 001 §1.4) — entity hints as json_string field, KG triples route to SQLite knowledge graph - Closets stay core-built as a post-step; adapters may emit flat closet_hints. Closes existing gap where convo drawers get no closets - No per-drawer field renames: source_file, filed_at, source_mtime, added_by, normalize_version, entities, ingest_mode all preserved. Spec adds adapter_name, adapter_version, privacy_class §9 enumerates the cleanup PR prerequisites (mempalace/sources/ module, PalaceContext facade, KnowledgeGraph.add_triple gaining backwards-compatible source_drawer_id + adapter_name params). Tracking issue: #989
…Code, MemPalace#274/MemPalace#232 Cursor, MemPalace#169 Pi, MemPalace#702 Cursor+factory.ai) Updates the multi-agent-support bullet to cite the actual upstream work instead of just gesturing at it. RFC 002 itself is PR MemPalace#990 (tracking issue MemPalace#989). Existing third-party prototypes already proposed against the spec: * OpenCode SQLite — PR MemPalace#23 * Cursor SQLite — issue MemPalace#274 * Cursor JSONL (earlier variant) — PR MemPalace#232 * Pi agent JSONL — PR MemPalace#169 * Combined Cursor + factory.ai — PR MemPalace#702 Each becomes a mempalace-source-<agent> package once RFC 002 lands. Names the path explicitly: fork unblocks the pattern by helping land RFC 002; per-agent adapter PRs land from their respective authors. Aider, Gemini CLI, Codex CLI, and Warp are roadmap targets without existing adapter PRs and are listed as such (no fabricated PR refs). https://claude.ai/code/session_01GvwducFnFtN8KYmfbWKMR6
Summary
Adds
_try_cursor_jsonl()parser tonormalize.pyfor importing Cursor IDE agent transcript sessions, addressing the Cursor portion of #59.~/.cursor/projects/<project>/agent-transcripts/<uuid>/<uuid>.jsonl{"role": "user"|"assistant", "message": {"content": [{"type": "text", "text": "..."}]}}rolekey (nottype) and list-typedcontentblocks, preventing false positives against Claude Code JSONL and Codex JSONLWhy JSONL only (no SQLite)?
Cursor's
state.vscdbSQLite database (~/Library/Application Support/Cursor/User/workspaceStorage/<hash>/state.vscdb) was investigated. It storescomposer.composerData(conversation metadata only — IDs, names, timestamps) andaiService.prompts(user prompts without assistant responses). Since there are no paired conversation turns in the SQLite format, the agent transcript JSONL is the only format with complete conversations suitable for MemPalace ingestion.Changes
mempalace/normalize.py_try_cursor_jsonl()parser + wire into_try_normalize_json()dispatch chain + update module docstringtests/test_normalize.pyTest plan
pytest tests/ -v)has_cursor_structureguard is sufficient to prevent false positivesCloses #59 (Cursor portion)