Fix Neo4j nested attributes serialization bug#1109
Open
Ataxia123 wants to merge 5 commits intogetzep:mainfrom
Open
Fix Neo4j nested attributes serialization bug#1109Ataxia123 wants to merge 5 commits intogetzep:mainfrom
Ataxia123 wants to merge 5 commits intogetzep:mainfrom
Conversation
Member
|
All contributors have signed the CLA ✍️ ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
danielchalef
added a commit
that referenced
this pull request
Dec 17, 2025
e49d54c to
689cad5
Compare
aasmall
added a commit
to penumbral-labs/graphiti
that referenced
this pull request
Mar 22, 2026
Port the Kuzu pattern to Neo4j: JSON-serialize the attributes dict instead of spreading as individual properties. Prevents Neo4j from rejecting nested Map values from LLM extraction. Read path supports both new JSON format and old spread format for backward compatibility with existing data. Based on upstream PR getzep#1109.
Neo4j was crashing when entity/edge attributes contained nested structures (Maps of Lists, Lists of Maps) because attributes were being spread as individual properties instead of serialized to JSON strings. Changes: - Serialize attributes to JSON for Neo4j (like Kuzu already does) - Update read path to handle both JSON strings and legacy dict format - Add integration tests for nested attribute structures - Maintain backward compatibility with existing code Fixes issue where LLM extraction with complex structured attributes would cause: Neo.ClientError.Statement.TypeError - Property values can only be of primitive types or arrays thereof. Modified Files: - graphiti_core/utils/bulk_utils.py: Serialize attributes for Neo4j - graphiti_core/nodes.py: Handle JSON string attributes in read path - graphiti_core/edges.py: Handle JSON string attributes in read path - graphiti_core/models/nodes/node_db_queries.py: Use n.attributes for Neo4j - graphiti_core/models/edges/edge_db_queries.py: Use e.attributes for Neo4j New Files: - tests/test_neo4j_nested_attributes_int.py: Integration tests - docs/neo4j-attributes-fix.md: Comprehensive documentation
…e behavior Issues fixed: 1. Only serialize attributes for Neo4j, not FalkorDB/Neptune 2. Maintain backward compatibility with existing Neo4j data Changes: - Write path: Use elif to specifically target Neo4j only - Query path: Use COALESCE and return both n.attributes and properties(n) - Read path: Try JSON string first, fall back to spread properties - FalkorDB/Neptune: Restore original spread behavior This ensures: - New Neo4j nodes: attributes as JSON string (supports nesting) - Old Neo4j nodes: attributes spread as properties (backward compatible) - FalkorDB/Neptune: unchanged behavior (no breaking changes)
7f54d11 to
1bebcaa
Compare
01a00b1 to
d01fd35
Compare
Two related improvements to community operations: 1. Replace label_propagation with the asynchronous form from Raghavan et al. (2007). The synchronous batch implementation has no convergence guard and oscillates indefinitely on graphs with high-degree hub nodes — we observed it looping forever on a real 48-node knowledge graph with a central hub connected to 14+ peers. The async form visits nodes in a fresh random order each pass and updates the community map in place, so neighbors immediately see the new label. Deterministic tie-breaking by community id plus a strict-improvement rule prevent churn on symmetric graphs. An oscillation safeguard via state-hash window catches any edge case. 2. Add a sample_size parameter to build_communities that bounds LLM cost on large graphs. Without sampling, community summary cost scales as O(total_nodes) because every entity's summary feeds the binary-merge tree. With sampling, each community's summary is built from only the top-K most representative members (highest in-community weighted degree, then longest summary, tie-broken by name). Cost becomes O(num_communities * sample_size) — a 20-40x reduction on 100k-node graphs, and typically improves quality because hub nodes carry the community's signal. get_community_clusters now optionally returns the projection it already builds during clustering, so sampling can score members without a second pass over the graph. Adds tests/utils/maintenance/test_community_operations.py with 15 unit tests covering both fixes: - Regression cases for the oscillation bug (hub graphs, real-world pathological projection, stars, rings, barbells) - Sampling correctness (degree preference, summary-length fallback, determinism, in-community scoping) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d01fd35 to
033fcf3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Neo4j was crashing when entity/edge attributes contained nested structures (Maps of Lists, Lists of Maps) because attributes were being spread as individual properties instead of serialized to JSON strings.
Changes:
Fixes issue where LLM extraction with complex structured attributes would cause: Neo.ClientError.Statement.TypeError - Property values can only be of primitive types or arrays thereof.
Modified Files:
New Files:
Summary
Brief description of the changes in this PR.
Type of Change
Objective
For new features and performance improvements: Clearly describe the objective and rationale for this change.
Testing
Breaking Changes
If this is a breaking change, describe:
Checklist
make lintpasses)Related Issues
Closes #[issue number]