fix: parse Claude.ai privacy export with messages key and sender field (#677) by mvalentsev · Pull Request #685 · MemPalace/mempalace

mvalentsev · 2026-04-12T07:15:35Z

Closes #677.

Claude.ai's Settings > Privacy > Export Data produces a conversations.json that _try_claude_ai_json can fail to parse for two reasons:

Key variant -- the privacy-export guard only checks for "chat_messages" in each conversation object. If an export uses "messages" instead (as described in fix: handle large claude.ai exports and multi-conversation "messages" key #676), the guard misses it. The array of conversation objects falls through to the flat-messages parser, which expects {role, content} dicts and silently skips the {uuid, name, messages} objects. The function returns None and the raw JSON gets filed as a single plain-text drawer.
Author field variant -- some privacy exports may use "sender": "human"/"assistant" instead of "role" (as described in Support claude.ai privacy export format with sender field #605). When the outer guard matches but the inner loop only checks item.get("role"), every message is skipped, producing an empty transcript.

Both failure modes result in the behavior reported in #677: a multi-MB file mined as one drawer classified "emotional."

Changes (2 files, +129/-21):

mempalace/normalize.py:

Outer guard now accepts both "chat_messages" and "messages" keys
Inner extraction: convo.get("chat_messages") or convo.get("messages", [])
Author field: item.get("role") or item.get("sender", "") handles both variants
Text fallback: tries item.get("text") when content blocks are empty
Per-conversation transcripts instead of concatenating everything into one blob
Shared _collect_claude_messages() helper to deduplicate extraction logic

tests/test_normalize.py:

6 new tests: messages key, sender field, text fallback, per-conversation separation, empty-conversation skipping

Scope note: this PR only touches the normalizer. It does not change convo_miner.py -- the 10 MB MAX_FILE_SIZE limit is unrelated to the parsing failure in #677 (the reported file is 8 MB) but is addressed separately in #605 and #676.

Related PRs: #605 (carlito1979) adds sender support, text fallback, and raises MAX_FILE_SIZE. #676 (z3tz3r0) adds messages key detection, per-conversation separation, and also raises MAX_FILE_SIZE. Both touch convo_miner.py which this PR does not. This PR combines the normalize.py parsing fixes from both approaches and adds the missing cross-coverage (neither PR alone handles both the key variant and the author field variant).

MemPalace#677) The privacy-export branch in _try_claude_ai_json only checked for the "chat_messages" key, missing exports that use "messages" instead. It also only read the "role" field while real privacy exports use "sender". Both gaps caused the file to fall through to plain-text, producing a single giant drawer. Changes: - Accept "messages" alongside "chat_messages" in the conversation-object guard and inner extraction. - Accept "sender" alongside "role" as the author field. - Fall back to a top-level "text" key when content blocks are empty. - Produce one transcript per conversation instead of concatenating all conversations into a single blob. - Extract shared logic into _collect_claude_messages helper. - Add 6 regression tests covering each variant.

item.get("text", "").strip() crashes when "text" is explicitly null in the JSON (legal and observed in some exports). Use (item.get("text") or "").strip() and add a regression test.

PR #761 bumped pyproject.toml to 3.2.0 but missed three other version strings, causing test_version_consistency to fail on develop CI (macos, linux 3.11, windows). - mempalace/version.py: 3.1.0 → 3.2.0 (unblocks test_version_consistency) - README.md: version badge shield 3.1.0 → 3.2.0 - integrations/openclaw/SKILL.md: 3.1.0 → 3.2.0 - CHANGELOG.md: rename [Unreleased] → [3.2.0] — 2026-04-13, add entries for #685, #690, #707, #716, #734, #755, #757, #761 Verified locally: 689/689 tests pass, ruff clean.

PR MemPalace#761 bumped pyproject.toml to 3.2.0 but missed three other version strings, causing test_version_consistency to fail on develop CI (macos, linux 3.11, windows). - mempalace/version.py: 3.1.0 → 3.2.0 (unblocks test_version_consistency) - README.md: version badge shield 3.1.0 → 3.2.0 - integrations/openclaw/SKILL.md: 3.1.0 → 3.2.0 - CHANGELOG.md: rename [Unreleased] → [3.2.0] — 2026-04-13, add entries for MemPalace#685, MemPalace#690, MemPalace#707, MemPalace#716, MemPalace#734, MemPalace#755, MemPalace#757, MemPalace#761 Verified locally: 689/689 tests pass, ruff clean.

Main had 9 commits that never back-merged into develop after the v3.2.0 release cycle. Resolving conflicts as follows: - mempalace/version.py: keep develop (3.3.0 release target) - README.md: keep develop (Milla's #835 audit is authoritative — main had stale 19 tools / 170 tokens / "30x lossless" / v3.0.0 label) - hooks/mempal_{save,precompact}_hook.sh: keep develop (#786 reversed the #666 "decision=block" behavior intentionally to stop hooks from making agents write in chat) - pyproject.toml: auto-merged — keeps develop's 3.3.0 and picks up main's chromadb upper-bound removal (#690) - CONTRIBUTING.md, mempalace/hooks_cli.py: auto-merged cleanly — picks up main's improvements (fork-first clone, more detailed block reason strings that name MemPalace and specific tools) - integrations/openclaw/SKILL.md: bumped 3.2.0 → 3.3.0 (version tracks the package per #761 convention) - CHANGELOG.md: manual merge — kept develop's preamble + Unreleased v3.3.0 section + footer links; folded main's richer v3.2.0 entries (Packaging section for #690/#761; Bug Fixes #685/#677/#716/#707/ #755/#757; Documentation #734/#733) into the v3.2.0 section; deduped the split Documentation sections that auto-merge produced

mvalentsev requested review from bensig, igorls and milla-jovovich as code owners April 12, 2026 07:15

mvalentsev force-pushed the fix/claude-ai-export-parsing branch from d90eb26 to 851a3cb Compare April 12, 2026 07:22

mvalentsev and others added 2 commits April 12, 2026 12:37

style: apply ruff format to normalize.py

d5b15ff

fix: guard against null text field in Claude.ai export parsing

c4b2ef2

item.get("text", "").strip() crashes when "text" is explicitly null in the JSON (legal and observed in some exports). Use (item.get("text") or "").strip() and add a regression test.

igorls merged commit a2432a3 into MemPalace:develop Apr 13, 2026
6 checks passed

igorls mentioned this pull request Apr 13, 2026

release: v3.2.0 #762

Merged

4 tasks

rusel95 mentioned this pull request Apr 13, 2026

fix: Claude.ai chat export normalizer misses sender/text fields #243

Closed

15 tasks

This was referenced Apr 14, 2026

release: v3.3.0 #839

Merged

sync: main → develop (post v3.3.0 release) #842

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: parse Claude.ai privacy export with messages key and sender field (#677)#685

fix: parse Claude.ai privacy export with messages key and sender field (#677)#685
igorls merged 3 commits intoMemPalace:developfrom
mvalentsev:fix/claude-ai-export-parsing

mvalentsev commented Apr 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mvalentsev commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mvalentsev commented Apr 12, 2026 •

edited

Loading