Skip to content

fix: parse Claude.ai privacy export with messages key and sender field (#677)#685

Merged
igorls merged 3 commits intoMemPalace:developfrom
mvalentsev:fix/claude-ai-export-parsing
Apr 13, 2026
Merged

fix: parse Claude.ai privacy export with messages key and sender field (#677)#685
igorls merged 3 commits intoMemPalace:developfrom
mvalentsev:fix/claude-ai-export-parsing

Conversation

@mvalentsev
Copy link
Copy Markdown
Contributor

@mvalentsev mvalentsev commented Apr 12, 2026

Closes #677.

Claude.ai's Settings > Privacy > Export Data produces a conversations.json that _try_claude_ai_json can fail to parse for two reasons:

  1. Key variant -- the privacy-export guard only checks for "chat_messages" in each conversation object. If an export uses "messages" instead (as described in fix: handle large claude.ai exports and multi-conversation "messages" key #676), the guard misses it. The array of conversation objects falls through to the flat-messages parser, which expects {role, content} dicts and silently skips the {uuid, name, messages} objects. The function returns None and the raw JSON gets filed as a single plain-text drawer.

  2. Author field variant -- some privacy exports may use "sender": "human"/"assistant" instead of "role" (as described in Support claude.ai privacy export format with sender field #605). When the outer guard matches but the inner loop only checks item.get("role"), every message is skipped, producing an empty transcript.

Both failure modes result in the behavior reported in #677: a multi-MB file mined as one drawer classified "emotional."

Changes (2 files, +129/-21):

mempalace/normalize.py:

  • Outer guard now accepts both "chat_messages" and "messages" keys
  • Inner extraction: convo.get("chat_messages") or convo.get("messages", [])
  • Author field: item.get("role") or item.get("sender", "") handles both variants
  • Text fallback: tries item.get("text") when content blocks are empty
  • Per-conversation transcripts instead of concatenating everything into one blob
  • Shared _collect_claude_messages() helper to deduplicate extraction logic

tests/test_normalize.py:

  • 6 new tests: messages key, sender field, text fallback, per-conversation separation, empty-conversation skipping

Scope note: this PR only touches the normalizer. It does not change convo_miner.py -- the 10 MB MAX_FILE_SIZE limit is unrelated to the parsing failure in #677 (the reported file is 8 MB) but is addressed separately in #605 and #676.

Related PRs: #605 (carlito1979) adds sender support, text fallback, and raises MAX_FILE_SIZE. #676 (z3tz3r0) adds messages key detection, per-conversation separation, and also raises MAX_FILE_SIZE. Both touch convo_miner.py which this PR does not. This PR combines the normalize.py parsing fixes from both approaches and adds the missing cross-coverage (neither PR alone handles both the key variant and the author field variant).

MemPalace#677)

The privacy-export branch in _try_claude_ai_json only checked for the
"chat_messages" key, missing exports that use "messages" instead.  It
also only read the "role" field while real privacy exports use "sender".
Both gaps caused the file to fall through to plain-text, producing a
single giant drawer.

Changes:
- Accept "messages" alongside "chat_messages" in the conversation-object
  guard and inner extraction.
- Accept "sender" alongside "role" as the author field.
- Fall back to a top-level "text" key when content blocks are empty.
- Produce one transcript per conversation instead of concatenating all
  conversations into a single blob.
- Extract shared logic into _collect_claude_messages helper.
- Add 6 regression tests covering each variant.
@mvalentsev mvalentsev force-pushed the fix/claude-ai-export-parsing branch from d90eb26 to 851a3cb Compare April 12, 2026 07:22
mvalentsev and others added 2 commits April 12, 2026 12:37
item.get("text", "").strip() crashes when "text" is explicitly null
in the JSON (legal and observed in some exports). Use
(item.get("text") or "").strip() and add a regression test.
@igorls igorls merged commit a2432a3 into MemPalace:develop Apr 13, 2026
6 checks passed
igorls added a commit that referenced this pull request Apr 13, 2026
PR #761 bumped pyproject.toml to 3.2.0 but missed three other version strings,
causing test_version_consistency to fail on develop CI (macos, linux 3.11, windows).

- mempalace/version.py: 3.1.0 → 3.2.0 (unblocks test_version_consistency)
- README.md: version badge shield 3.1.0 → 3.2.0
- integrations/openclaw/SKILL.md: 3.1.0 → 3.2.0
- CHANGELOG.md: rename [Unreleased] → [3.2.0] — 2026-04-13, add entries
  for #685, #690, #707, #716, #734, #755, #757, #761

Verified locally: 689/689 tests pass, ruff clean.
@igorls igorls mentioned this pull request Apr 13, 2026
4 tasks
sha2fiddy pushed a commit to sha2fiddy/mempalace that referenced this pull request Apr 13, 2026
PR MemPalace#761 bumped pyproject.toml to 3.2.0 but missed three other version strings,
causing test_version_consistency to fail on develop CI (macos, linux 3.11, windows).

- mempalace/version.py: 3.1.0 → 3.2.0 (unblocks test_version_consistency)
- README.md: version badge shield 3.1.0 → 3.2.0
- integrations/openclaw/SKILL.md: 3.1.0 → 3.2.0
- CHANGELOG.md: rename [Unreleased] → [3.2.0] — 2026-04-13, add entries
  for MemPalace#685, MemPalace#690, MemPalace#707, MemPalace#716, MemPalace#734, MemPalace#755, MemPalace#757, MemPalace#761

Verified locally: 689/689 tests pass, ruff clean.
sha2fiddy pushed a commit to sha2fiddy/mempalace that referenced this pull request Apr 14, 2026
PR MemPalace#761 bumped pyproject.toml to 3.2.0 but missed three other version strings,
causing test_version_consistency to fail on develop CI (macos, linux 3.11, windows).

- mempalace/version.py: 3.1.0 → 3.2.0 (unblocks test_version_consistency)
- README.md: version badge shield 3.1.0 → 3.2.0
- integrations/openclaw/SKILL.md: 3.1.0 → 3.2.0
- CHANGELOG.md: rename [Unreleased] → [3.2.0] — 2026-04-13, add entries
  for MemPalace#685, MemPalace#690, MemPalace#707, MemPalace#716, MemPalace#734, MemPalace#755, MemPalace#757, MemPalace#761

Verified locally: 689/689 tests pass, ruff clean.
igorls added a commit that referenced this pull request Apr 14, 2026
Main had 9 commits that never back-merged into develop after the v3.2.0
release cycle. Resolving conflicts as follows:

- mempalace/version.py: keep develop (3.3.0 release target)
- README.md: keep develop (Milla's #835 audit is authoritative — main
  had stale 19 tools / 170 tokens / "30x lossless" / v3.0.0 label)
- hooks/mempal_{save,precompact}_hook.sh: keep develop (#786 reversed
  the #666 "decision=block" behavior intentionally to stop hooks from
  making agents write in chat)
- pyproject.toml: auto-merged — keeps develop's 3.3.0 and picks up
  main's chromadb upper-bound removal (#690)
- CONTRIBUTING.md, mempalace/hooks_cli.py: auto-merged cleanly —
  picks up main's improvements (fork-first clone, more detailed
  block reason strings that name MemPalace and specific tools)
- integrations/openclaw/SKILL.md: bumped 3.2.0 → 3.3.0 (version
  tracks the package per #761 convention)
- CHANGELOG.md: manual merge — kept develop's preamble + Unreleased
  v3.3.0 section + footer links; folded main's richer v3.2.0 entries
  (Packaging section for #690/#761; Bug Fixes #685/#677/#716/#707/
  #755/#757; Documentation #734/#733) into the v3.2.0 section;
  deduped the split Documentation sections that auto-merge produced
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

conversations.json from Claude.ai data export not parsed — mined as single drawer

2 participants