feat: add Pi agent JSONL session normalizer by adv3nt3 · Pull Request #169 · MemPalace/mempalace

adv3nt3 · 2026-04-07T23:13:27Z

Summary

Add _try_pi_jsonl parser for Pi agent session files stored at ~/.config/pi/agent/sessions/{encoded-cwd}/{timestamp}_{uuid}.jsonl. This is the 8th normalize format for MemPalace, alongside Claude AI JSON, ChatGPT JSON, Claude Code JSONL, Codex CLI JSONL (#61), Gemini CLI JSON (#155), Slack JSON, and plain text.

Pi session format

Pi stores sessions as JSONL files with a tree-structured message history. Sessions are project-scoped — folder names encode the working directory path (e.g. --home-arcka-openclaude--).

Path: ~/.config/pi/agent/sessions/{encoded-cwd}/{timestamp}_{uuid}.jsonl

Structure (one JSON object per line):

{"type":"session","version":3,"id":"c9db2d16-...","timestamp":"2026-04-02T23:48:11.257Z","cwd":"/home/arcka/openclaude"}
{"type":"model_change","id":"62d8b4f0","parentId":null,"timestamp":"...","provider":"github-copilot","modelId":"claude-opus-4.6"}
{"type":"thinking_level_change","id":"c7f4db51","parentId":"62d8b4f0","timestamp":"...","thinkingLevel":"high"}
{"type":"message","id":"m1","parentId":"c7f4db51","timestamp":"...","message":{"role":"user","content":[{"type":"text","text":"Explain the architecture"}]}}
{"type":"message","id":"m2","parentId":"m1","timestamp":"...","message":{"role":"assistant","content":[{"type":"text","text":"The project uses..."}],"provider":"github-copilot","model":"claude-opus-4.6","stopReason":"stop"}}

Event types

`type` value	Contains	Extracted?
`session`	Session header — version, id, cwd	Fingerprint only
`message` (role `user`)	User prompts — content as string or `[{type, text}]` blocks	Yes
`message` (role `assistant`)	Assistant replies — content as `[{type, text}]` blocks, may include `thinking` blocks	Yes (text only, thinking skipped)
`message` (role `toolResult`)	Tool outputs — `toolCallId`, `toolName`, content	Skipped
`model_change`	Provider/model switches	Skipped
`thinking_level_change`	Reasoning level adjustments	Skipped
`compaction`	Context summarization events	Skipped
`branch_summary`	Branch points in tree history	Skipped
`label`	Bookmark labels on messages	Skipped
`custom` / `custom_message`	Extension data	Skipped

TypeScript types (from Pi source)

interface UserMessage {
  role: "user";
  content: string | (TextContent | ImageContent)[];
  timestamp: number;
}

interface AssistantMessage {
  role: "assistant";
  content: (TextContent | ThinkingContent | ToolCall)[];
  provider: string;
  model: string;
  usage: Usage;
  stopReason: "stop" | "length" | "toolUse" | "error" | "aborted";
}

Design decisions

Only `user` and `assistant` messages extracted

The parser extracts type: "message" entries where message.role is "user" or "assistant". Tool results, model changes, thinking level changes, compaction events, and branch summaries are skipped — they're operational metadata, not conversation content.

`thinking` blocks in assistant content are automatically skipped

Assistant content can include {"type": "thinking", ...} blocks alongside {"type": "text", ...} blocks. The shared _extract_content helper only picks up type == "text", so thinking is naturally filtered out.

Aborted/empty messages are skipped

If an assistant message has empty content ([]) or only thinking blocks, _extract_content returns empty string and the message is not added to the transcript.

Fingerprints on `session` header with `version` key

The parser requires a type: "session" line with a version field to positively identify Pi session files. This distinguishes from:

Codex JSONL — uses type: "session_meta" (no version key)
Claude Code JSONL — uses type: "human" / type: "assistant" at top level
Other JSONL — unlikely to have both type: "session" and version

Uses shared `_extract_content` helper

Pi's user content blocks use the standard {"type": "text", "text": "..."} format (same as Claude/OpenAI), so the shared helper works directly — unlike Gemini which needed custom extraction.

What's NOT handled (and why)

Tree structure / branching: Pi sessions support branching via parentId chains. The parser reads messages linearly (file order) without reconstructing the branch tree. This matches how all other parsers work — linear extraction.
Compaction summaries: Pi can compact history mid-session. The parser skips compaction events — they're internal context management, not user conversation.
Tool call details: Only the text portion of assistant messages is extracted. Tool names, arguments, and results are skipped.
Image content: Pi supports ImageContent in messages. The parser skips these (no text to extract).

Verification sources

Pi session.md — official JSONL format documentation with TypeScript types and parsing examples (via Context7)
Sample gist from @tunnckoCore — real Pi session data
Issue #59 comment — session path, folder naming, file structure

Changes

1 file changed (mempalace/normalize.py), 52 insertions:

New _try_pi_jsonl() parser function
Registered in _try_normalize_json() dispatcher after Codex JSONL
Module docstring updated to list Pi agent JSONL as supported format

Test plan

ruff check mempalace/normalize.py passes clean
ruff format --check already formatted
python3 -m py_compile mempalace/normalize.py compiles OK
Tested against sample Pi session data — produces correct > marker transcripts
False positive check — returns None for Codex JSONL, plain text, empty input
_extract_content correctly handles Pi's [{"type":"text","text":"..."}] content blocks
thinking blocks in assistant content are automatically filtered out
Pyright reports 0 new diagnostics
Format verified against official Pi session docs via Context7

Refs: #59

cc @tunnckoCore — this implements the Pi parser based on your session data. If you can test against your full sessions, that would help validate.

tunnckoCore · 2026-04-08T01:46:38Z

@adv3nt3 thanks, I'll try it a bit later or tomorrow.

a note tho: i think that thinking and compaction should be able to be included? maybe behind a flag or env var. It just makes sense to me for some reason.. 😅🤷‍♂️ but agree it shouldn't be the default.

I've started a Rust impl several hours ago, if you are interested 😉 I'll add this ingest normalizer thing there too. https://github.com/tunnckoCore/mempalace-rust

adv3nt3 · 2026-04-08T08:13:56Z

@tunnckoCore I thought about this, I'd keep thinking/compaction out of the default for now.

Thinking blocks are mostly the model planning its next step, and the actual conclusions already show up in the assistant's text. Including them would add a lot of noise to drawers without much recall value. Compaction summaries have a similar problem, they're lossy summaries Pi makes to manage its own context window, and MemPalace already has AAAK for compression, so you'd end up storing a summary of a summary.

That said, happy to add an opt-in flag later if we run into cases where the thinking context is genuinely useful. For now keeping it clean feels right.

Add _try_pi_jsonl parser for Pi agent session files stored at ~/.config/pi/agent/sessions/{encoded-cwd}/{timestamp}_{uuid}.jsonl. Uses type "message" entries with role "user"/"assistant". Skips toolResult messages, model_change, thinking_level_change, and other operational events. Requires session header (type "session" with "version" key) to avoid false positives. Format documented at github.com/badlogic/pi-mono session.md and verified via Context7. Sample data provided by tunnckoCore in MemPalace#59. Refs: MemPalace#59

web3guru888

✨ Review of #169 — feat: add Pi agent JSONL session normalizer

Scope: +52/−0 · 1 file(s)

mempalace/normalize.py (modified: +52/−0)

Suggestions

💡 No tests included — consider adding coverage for the new code paths

🟢 Approved — clean, well-structured PR. Good work @adv3nt3!

_{🏛️ Reviewed by MemPalace-AGI · Autonomous research system with perfect memory · Showcase: Truth Palace of Atlantis}

Draft plugin specification for source adapters, mirroring RFC 001's role for storage backends. Formalizes the contract six community ingester PRs (#274, #23, #169, #232, #567, #98, #702) plus #981's metadata-only mode have been reinventing ad-hoc, so adapter authors can build to a stable surface. Key decisions: - Single ingest() method; lazy adapters yield SourceItemMetadata ahead of drawers, eager adapters interleave - Declared-transformation model (§1.4) replaces informal verbatim promise with a verifiable one; byte_preserving adapters declare the empty set, declared_lossy adapters enumerate. Existing miner.py and the convo_miner+normalize pipeline map cleanly - Palace is the incremental cursor via is_current(item, metadata); no sidecar persistence - Routing is adapter-owned; detect_room/detect_hall move into the filesystem adapter - Flat metadata per ChromaDB (RFC 001 §1.4) — entity hints as json_string field, KG triples route to SQLite knowledge graph - Closets stay core-built as a post-step; adapters may emit flat closet_hints. Closes existing gap where convo drawers get no closets - No per-drawer field renames: source_file, filed_at, source_mtime, added_by, normalize_version, entities, ingest_mode all preserved. Spec adds adapter_name, adapter_version, privacy_class §9 enumerates the cleanup PR prerequisites (mempalace/sources/ module, PalaceContext facade, KnowledgeGraph.add_triple gaining backwards-compatible source_drawer_id + adapter_name params). Tracking issue: #989

…Code, MemPalace#274/MemPalace#232 Cursor, MemPalace#169 Pi, MemPalace#702 Cursor+factory.ai) Updates the multi-agent-support bullet to cite the actual upstream work instead of just gesturing at it. RFC 002 itself is PR MemPalace#990 (tracking issue MemPalace#989). Existing third-party prototypes already proposed against the spec: * OpenCode SQLite — PR MemPalace#23 * Cursor SQLite — issue MemPalace#274 * Cursor JSONL (earlier variant) — PR MemPalace#232 * Pi agent JSONL — PR MemPalace#169 * Combined Cursor + factory.ai — PR MemPalace#702 Each becomes a mempalace-source-<agent> package once RFC 002 lands. Names the path explicitly: fork unblocks the pattern by helping land RFC 002; per-agent adapter PRs land from their respective authors. Aider, Gemini CLI, Codex CLI, and Warp are roadmap targets without existing adapter PRs and are listed as such (no fabricated PR refs). https://claude.ai/code/session_01GvwducFnFtN8KYmfbWKMR6

adv3nt3 mentioned this pull request Apr 7, 2026

feat: add import support for more AI tool session formats (Cursor, Copilot, Codex, Windsurf, Aider, etc.) #59

Open

adv3nt3 force-pushed the feat/pi-cli-normalizer branch from 1f6bb3c to 3592183 Compare April 7, 2026 23:29

adv3nt3 force-pushed the feat/pi-cli-normalizer branch from 3592183 to 5f46ff7 Compare April 9, 2026 17:53

web3guru888 approved these changes Apr 11, 2026

View reviewed changes

bensig changed the base branch from main to develop April 11, 2026 22:23

bensig requested review from bensig and milla-jovovich as code owners April 11, 2026 22:23

igorls added area/mining File and conversation mining enhancement New feature or request labels Apr 14, 2026

bensig mentioned this pull request Apr 18, 2026

RFC: Source adapter plugin specification #989

Open

bensig mentioned this pull request Apr 18, 2026

docs: RFC 002 — Source adapter plugin specification #990

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Pi agent JSONL session normalizer#169

feat: add Pi agent JSONL session normalizer#169
adv3nt3 wants to merge 1 commit intoMemPalace:developfrom
adv3nt3:feat/pi-cli-normalizer

adv3nt3 commented Apr 7, 2026

Uh oh!

tunnckoCore commented Apr 8, 2026 •

edited

Loading

Uh oh!

adv3nt3 commented Apr 8, 2026

Uh oh!

web3guru888 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

adv3nt3 commented Apr 7, 2026

Summary

Pi session format

Event types

TypeScript types (from Pi source)

Design decisions

Only user and assistant messages extracted

thinking blocks in assistant content are automatically skipped

Aborted/empty messages are skipped

Fingerprints on session header with version key

Uses shared _extract_content helper

What's NOT handled (and why)

Verification sources

Changes

Test plan

Uh oh!

tunnckoCore commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adv3nt3 commented Apr 8, 2026

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Suggestions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Only `user` and `assistant` messages extracted

`thinking` blocks in assistant content are automatically skipped

Fingerprints on `session` header with `version` key

Uses shared `_extract_content` helper

tunnckoCore commented Apr 8, 2026 •

edited

Loading