Skip to content

Fix Neo4j nested attributes serialization bug#1109

Open
Ataxia123 wants to merge 5 commits intogetzep:mainfrom
NERDDAO:fix/neo4j-nested-attributes-serialization
Open

Fix Neo4j nested attributes serialization bug#1109
Ataxia123 wants to merge 5 commits intogetzep:mainfrom
NERDDAO:fix/neo4j-nested-attributes-serialization

Conversation

@Ataxia123
Copy link
Copy Markdown

@Ataxia123 Ataxia123 commented Dec 17, 2025

Neo4j was crashing when entity/edge attributes contained nested structures (Maps of Lists, Lists of Maps) because attributes were being spread as individual properties instead of serialized to JSON strings.

Changes:

  • Serialize attributes to JSON for Neo4j (like Kuzu already does)
  • Update read path to handle both JSON strings and legacy dict format
  • Add integration tests for nested attribute structures
  • Maintain backward compatibility with existing code

Fixes issue where LLM extraction with complex structured attributes would cause: Neo.ClientError.Statement.TypeError - Property values can only be of primitive types or arrays thereof.

Modified Files:

  • graphiti_core/utils/bulk_utils.py: Serialize attributes for Neo4j
  • graphiti_core/nodes.py: Handle JSON string attributes in read path
  • graphiti_core/edges.py: Handle JSON string attributes in read path
  • graphiti_core/models/nodes/node_db_queries.py: Use n.attributes for Neo4j
  • graphiti_core/models/edges/edge_db_queries.py: Use e.attributes for Neo4j

New Files:

  • tests/test_neo4j_nested_attributes_int.py: Integration tests
  • docs/neo4j-attributes-fix.md: Comprehensive documentation

Summary

Brief description of the changes in this PR.

Type of Change

  • [x ] Bug fix
  • New feature
  • Performance improvement
  • Documentation/Tests

Objective

For new features and performance improvements: Clearly describe the objective and rationale for this change.

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • All existing tests pass

Breaking Changes

  • This PR contains breaking changes

If this is a breaking change, describe:

  • What functionality is affected
  • Migration path for existing users

Checklist

  • Code follows project style guidelines (make lint passes)
  • Self-review completed
  • Documentation updated where necessary
  • No secrets or sensitive information committed

Related Issues

Closes #[issue number]

@danielchalef
Copy link
Copy Markdown
Member

danielchalef commented Dec 17, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@Ataxia123
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

danielchalef added a commit that referenced this pull request Dec 17, 2025
@Ataxia123 Ataxia123 force-pushed the fix/neo4j-nested-attributes-serialization branch from e49d54c to 689cad5 Compare January 14, 2026 16:10
aasmall added a commit to penumbral-labs/graphiti that referenced this pull request Mar 22, 2026
Port the Kuzu pattern to Neo4j: JSON-serialize the attributes dict instead
of spreading as individual properties. Prevents Neo4j from rejecting nested
Map values from LLM extraction.

Read path supports both new JSON format and old spread format for backward
compatibility with existing data.

Based on upstream PR getzep#1109.
Neo4j was crashing when entity/edge attributes contained nested structures
(Maps of Lists, Lists of Maps) because attributes were being spread as
individual properties instead of serialized to JSON strings.

Changes:
- Serialize attributes to JSON for Neo4j (like Kuzu already does)
- Update read path to handle both JSON strings and legacy dict format
- Add integration tests for nested attribute structures
- Maintain backward compatibility with existing code

Fixes issue where LLM extraction with complex structured attributes
would cause: Neo.ClientError.Statement.TypeError - Property values
can only be of primitive types or arrays thereof.

Modified Files:
- graphiti_core/utils/bulk_utils.py: Serialize attributes for Neo4j
- graphiti_core/nodes.py: Handle JSON string attributes in read path
- graphiti_core/edges.py: Handle JSON string attributes in read path
- graphiti_core/models/nodes/node_db_queries.py: Use n.attributes for Neo4j
- graphiti_core/models/edges/edge_db_queries.py: Use e.attributes for Neo4j

New Files:
- tests/test_neo4j_nested_attributes_int.py: Integration tests
- docs/neo4j-attributes-fix.md: Comprehensive documentation
…e behavior

Issues fixed:
1. Only serialize attributes for Neo4j, not FalkorDB/Neptune
2. Maintain backward compatibility with existing Neo4j data

Changes:
- Write path: Use elif to specifically target Neo4j only
- Query path: Use COALESCE and return both n.attributes and properties(n)
- Read path: Try JSON string first, fall back to spread properties
- FalkorDB/Neptune: Restore original spread behavior

This ensures:
- New Neo4j nodes: attributes as JSON string (supports nesting)
- Old Neo4j nodes: attributes spread as properties (backward compatible)
- FalkorDB/Neptune: unchanged behavior (no breaking changes)
@Ataxia123 Ataxia123 force-pushed the fix/neo4j-nested-attributes-serialization branch from 7f54d11 to 1bebcaa Compare April 2, 2026 22:23
@Ataxia123 Ataxia123 force-pushed the fix/neo4j-nested-attributes-serialization branch 2 times, most recently from 01a00b1 to d01fd35 Compare April 8, 2026 14:51
Two related improvements to community operations:

1. Replace label_propagation with the asynchronous form from Raghavan
   et al. (2007). The synchronous batch implementation has no
   convergence guard and oscillates indefinitely on graphs with
   high-degree hub nodes — we observed it looping forever on a real
   48-node knowledge graph with a central hub connected to 14+ peers.

   The async form visits nodes in a fresh random order each pass and
   updates the community map in place, so neighbors immediately see
   the new label. Deterministic tie-breaking by community id plus a
   strict-improvement rule prevent churn on symmetric graphs. An
   oscillation safeguard via state-hash window catches any edge case.

2. Add a sample_size parameter to build_communities that bounds LLM
   cost on large graphs. Without sampling, community summary cost
   scales as O(total_nodes) because every entity's summary feeds the
   binary-merge tree. With sampling, each community's summary is
   built from only the top-K most representative members (highest
   in-community weighted degree, then longest summary, tie-broken by
   name). Cost becomes O(num_communities * sample_size) — a 20-40x
   reduction on 100k-node graphs, and typically improves quality
   because hub nodes carry the community's signal.

   get_community_clusters now optionally returns the projection it
   already builds during clustering, so sampling can score members
   without a second pass over the graph.

Adds tests/utils/maintenance/test_community_operations.py with 15
unit tests covering both fixes:
- Regression cases for the oscillation bug (hub graphs, real-world
  pathological projection, stars, rings, barbells)
- Sampling correctness (degree preference, summary-length fallback,
  determinism, in-community scoping)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Ataxia123 Ataxia123 force-pushed the fix/neo4j-nested-attributes-serialization branch from d01fd35 to 033fcf3 Compare April 8, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants