feat: add Pi agent JSONL session normalizer#169
feat: add Pi agent JSONL session normalizer#169adv3nt3 wants to merge 1 commit intoMemPalace:developfrom
Conversation
1f6bb3c to
3592183
Compare
|
@adv3nt3 thanks, I'll try it a bit later or tomorrow. a note tho: i think that thinking and compaction should be able to be included? maybe behind a flag or env var. It just makes sense to me for some reason.. 😅🤷♂️ but agree it shouldn't be the default. I've started a Rust impl several hours ago, if you are interested 😉 I'll add this ingest normalizer thing there too. https://github.com/tunnckoCore/mempalace-rust |
|
@tunnckoCore I thought about this, I'd keep thinking/compaction out of the default for now. Thinking blocks are mostly the model planning its next step, and the actual conclusions already show up in the assistant's text. Including them would add a lot of noise to drawers without much recall value. Compaction summaries have a similar problem, they're lossy summaries Pi makes to manage its own context window, and MemPalace already has AAAK for compression, so you'd end up storing a summary of a summary. That said, happy to add an opt-in flag later if we run into cases where the thinking context is genuinely useful. For now keeping it clean feels right. |
Add _try_pi_jsonl parser for Pi agent session files stored at
~/.config/pi/agent/sessions/{encoded-cwd}/{timestamp}_{uuid}.jsonl.
Uses type "message" entries with role "user"/"assistant". Skips
toolResult messages, model_change, thinking_level_change, and other
operational events. Requires session header (type "session" with
"version" key) to avoid false positives.
Format documented at github.com/badlogic/pi-mono session.md and
verified via Context7. Sample data provided by tunnckoCore in MemPalace#59.
Refs: MemPalace#59
3592183 to
5f46ff7
Compare
web3guru888
left a comment
There was a problem hiding this comment.
✨ Review of #169 — feat: add Pi agent JSONL session normalizer
Scope: +52/−0 · 1 file(s)
mempalace/normalize.py(modified: +52/−0)
Suggestions
- 💡 No tests included — consider adding coverage for the new code paths
🟢 Approved — clean, well-structured PR. Good work @adv3nt3!
🏛️ Reviewed by MemPalace-AGI · Autonomous research system with perfect memory · Showcase: Truth Palace of Atlantis
Draft plugin specification for source adapters, mirroring RFC 001's role for storage backends. Formalizes the contract six community ingester PRs (#274, #23, #169, #232, #567, #98, #702) plus #981's metadata-only mode have been reinventing ad-hoc, so adapter authors can build to a stable surface. Key decisions: - Single ingest() method; lazy adapters yield SourceItemMetadata ahead of drawers, eager adapters interleave - Declared-transformation model (§1.4) replaces informal verbatim promise with a verifiable one; byte_preserving adapters declare the empty set, declared_lossy adapters enumerate. Existing miner.py and the convo_miner+normalize pipeline map cleanly - Palace is the incremental cursor via is_current(item, metadata); no sidecar persistence - Routing is adapter-owned; detect_room/detect_hall move into the filesystem adapter - Flat metadata per ChromaDB (RFC 001 §1.4) — entity hints as json_string field, KG triples route to SQLite knowledge graph - Closets stay core-built as a post-step; adapters may emit flat closet_hints. Closes existing gap where convo drawers get no closets - No per-drawer field renames: source_file, filed_at, source_mtime, added_by, normalize_version, entities, ingest_mode all preserved. Spec adds adapter_name, adapter_version, privacy_class §9 enumerates the cleanup PR prerequisites (mempalace/sources/ module, PalaceContext facade, KnowledgeGraph.add_triple gaining backwards-compatible source_drawer_id + adapter_name params). Tracking issue: #989
…Code, MemPalace#274/MemPalace#232 Cursor, MemPalace#169 Pi, MemPalace#702 Cursor+factory.ai) Updates the multi-agent-support bullet to cite the actual upstream work instead of just gesturing at it. RFC 002 itself is PR MemPalace#990 (tracking issue MemPalace#989). Existing third-party prototypes already proposed against the spec: * OpenCode SQLite — PR MemPalace#23 * Cursor SQLite — issue MemPalace#274 * Cursor JSONL (earlier variant) — PR MemPalace#232 * Pi agent JSONL — PR MemPalace#169 * Combined Cursor + factory.ai — PR MemPalace#702 Each becomes a mempalace-source-<agent> package once RFC 002 lands. Names the path explicitly: fork unblocks the pattern by helping land RFC 002; per-agent adapter PRs land from their respective authors. Aider, Gemini CLI, Codex CLI, and Warp are roadmap targets without existing adapter PRs and are listed as such (no fabricated PR refs). https://claude.ai/code/session_01GvwducFnFtN8KYmfbWKMR6
Summary
Add
_try_pi_jsonlparser for Pi agent session files stored at~/.config/pi/agent/sessions/{encoded-cwd}/{timestamp}_{uuid}.jsonl. This is the 8th normalize format for MemPalace, alongside Claude AI JSON, ChatGPT JSON, Claude Code JSONL, Codex CLI JSONL (#61), Gemini CLI JSON (#155), Slack JSON, and plain text.Pi session format
Pi stores sessions as JSONL files with a tree-structured message history. Sessions are project-scoped — folder names encode the working directory path (e.g.
--home-arcka-openclaude--).Path:
~/.config/pi/agent/sessions/{encoded-cwd}/{timestamp}_{uuid}.jsonlStructure (one JSON object per line):
{"type":"session","version":3,"id":"c9db2d16-...","timestamp":"2026-04-02T23:48:11.257Z","cwd":"/home/arcka/openclaude"} {"type":"model_change","id":"62d8b4f0","parentId":null,"timestamp":"...","provider":"github-copilot","modelId":"claude-opus-4.6"} {"type":"thinking_level_change","id":"c7f4db51","parentId":"62d8b4f0","timestamp":"...","thinkingLevel":"high"} {"type":"message","id":"m1","parentId":"c7f4db51","timestamp":"...","message":{"role":"user","content":[{"type":"text","text":"Explain the architecture"}]}} {"type":"message","id":"m2","parentId":"m1","timestamp":"...","message":{"role":"assistant","content":[{"type":"text","text":"The project uses..."}],"provider":"github-copilot","model":"claude-opus-4.6","stopReason":"stop"}}Event types
typevaluesessionmessage(roleuser)[{type, text}]blocksmessage(roleassistant)[{type, text}]blocks, may includethinkingblocksmessage(roletoolResult)toolCallId,toolName, contentmodel_changethinking_level_changecompactionbranch_summarylabelcustom/custom_messageTypeScript types (from Pi source)
Design decisions
Only
userandassistantmessages extractedThe parser extracts
type: "message"entries wheremessage.roleis"user"or"assistant". Tool results, model changes, thinking level changes, compaction events, and branch summaries are skipped — they're operational metadata, not conversation content.thinkingblocks in assistant content are automatically skippedAssistant content can include
{"type": "thinking", ...}blocks alongside{"type": "text", ...}blocks. The shared_extract_contenthelper only picks uptype == "text", so thinking is naturally filtered out.Aborted/empty messages are skipped
If an assistant message has empty content (
[]) or only thinking blocks,_extract_contentreturns empty string and the message is not added to the transcript.Fingerprints on
sessionheader withversionkeyThe parser requires a
type: "session"line with aversionfield to positively identify Pi session files. This distinguishes from:type: "session_meta"(noversionkey)type: "human"/type: "assistant"at top leveltype: "session"andversionUses shared
_extract_contenthelperPi's user content blocks use the standard
{"type": "text", "text": "..."}format (same as Claude/OpenAI), so the shared helper works directly — unlike Gemini which needed custom extraction.What's NOT handled (and why)
parentIdchains. The parser reads messages linearly (file order) without reconstructing the branch tree. This matches how all other parsers work — linear extraction.ImageContentin messages. The parser skips these (no text to extract).Verification sources
Changes
1 file changed (
mempalace/normalize.py), 52 insertions:_try_pi_jsonl()parser function_try_normalize_json()dispatcher after Codex JSONLTest plan
ruff check mempalace/normalize.pypasses cleanruff format --checkalready formattedpython3 -m py_compile mempalace/normalize.pycompiles OK>marker transcriptsNonefor Codex JSONL, plain text, empty input_extract_contentcorrectly handles Pi's[{"type":"text","text":"..."}]content blocksthinkingblocks in assistant content are automatically filtered outRefs: #59
cc @tunnckoCore — this implements the Pi parser based on your session data. If you can test against your full sessions, that would help validate.