Q&A: DistributedTemporalSync โ vector-clock gossip, network partitions, and conflict resolution ๐ #458
Unanswered
web3guru888
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Q&A: DistributedTemporalSync โ implementation questions ๐
Phase 18.3 ยท Issue #456 ยท Show & Tell #457
Q1: Why vector clocks instead of a simpler timestamp (e.g., wall-clock or Lamport scalar)?
A: Wall-clock timestamps require synchronised clocks across nodes (NTP drift can be milliseconds or worse), making causal ordering unreliable. A Lamport scalar gives a total order but cannot detect concurrent updates โ you can't tell if two events are truly causally unrelated or just ordered by the scalar. Vector clocks preserve full causal structure:
VectorClock.dominates()tells us precisely whether one agent's view is a causal superset of another's, which drives our conflict resolution logic correctly.Q2: What happens during a network partition โ agent A can't reach agent B for 60 seconds?
A: The
broadcast_sync()method catches all exceptions and returns{peer_id: -1}for unreachable peers โ degraded mode, not a crash.asi_temporal_sync_peers_reachabledrops, triggering a Grafana alert. Meanwhile, A continues accumulating edges locally. When the partition heals,edges_since(peer_clock)returns the full 60-second backlog. To bound message size, a futuremax_delta_edges: intconfig cap (already on theSyncConfig) truncates the push to the most recent N edges โ older edges get caught up on subsequent rounds.Q3: Can
_resolve_conflictshandle the case where neither clock dominates (true concurrency)?A: The current
LAST_WRITER_WINSbranch returns an empty list when neither clock dominates โ effectively discarding the remote edges. This is a known limitation documented in the implementation notes on issue #456. The correct fix is a_concurrent_merge()fallback that unions the edges (like MERGE_ALL) but only for the concurrent subset. This is deferred to Phase 18.5 (TemporalCoherenceArbiter) which will handle exactly these incoherence cases.Q4: How does
TemporalGraph.edges_since(clock)know which edges to return?A: Each edge stored in the
TemporalGraphmust carry asource_agent_id: strandagent_clock_value: intfield โ the clock value of the authoring agent at edge creation time.edges_since(clock)filters to edges whereedge.agent_clock_value > clock.clocks.get(edge.source_agent_id, 0). For performance, the graph maintains a_clock_index: dict[str, SortedList[int, EdgeId]]per agent, making lookup O(log n) instead of O(n).Q5: Why is the asyncio.Lock held for the entire sync round (DIALING โ VERIFYING)? Couldn't that block incoming pushes?
A: Yes โ the coarse lock prevents concurrent outbound sync and inbound
receive_push()from racing on the sameTemporalGraph. For most deployments withsync_interval_s=5.0and small deltas, the lock hold time is well under 100ms. For high-throughput scenarios (>10 peers, large deltas), the recommendation in issue #456 is per-peer locks: each peer gets its ownasyncio.Lock, allowing parallel sync with different peers.receive_push()would acquire the graph write lock (a separate lock onTemporalGraph) independently.Q6: How does DistributedTemporalSync interact with MemoryConsolidator (18.2)?
A: After
apply_edges()completes, the newly merged edges are available toTemporalGraph.get_unconsolidated_traces()โ the same interface MemoryConsolidator polls on its background sweep. No direct coupling is needed: the consolidator simply finds more traces on its next sweep. Theverify_consistency()step (whenverify_post_merge=True) ensures the graph is acyclic and causally consistent before consolidation can read it โ preventing the consolidator from extracting patterns from a partially-applied delta.Q7: How do I write tests for the gossip protocol without running real network sockets?
A: Inject an
InMemoryPeerTransport:This lets
test_sync_with_peer_exchanges_edgesrun entirely in-memory, with twoAsyncDistributedTemporalSyncinstances sharing anInMemoryPeerTransportdict โ no asyncio server needed.Grafana โ Sync Topology Dashboard YAML:
Beta Was this translation helpful? Give feedback.
All reactions