feat: new bootstrap process by SWvheerden · Pull Request #7121 · tari-project/tari

SWvheerden · 2025-05-28T13:13:39Z

Description

Implement a better bootstrap process so we don't overload seed nodes

Motivation and Context

Implement the following bootstrap process:

Get the list of seed peers.
Connect to the seed peers (all of them) one at a time, and get THEIR list of peers, and then immediately disconnect.
Once the peer DHT is populated with a bunch of non-seed peers then we can actually connect to the network - but we also make sure the seed peers DO NOT form part of the DHT.

Basically the seed nodes should never be connected to for very long, because otherwise their connection pool dries up.

How Has This Been Tested?

"It works on my machine"

What process can a PR reviewer use to test or verify this change?

Clear the whole ~/.tari/mainnet folder, and start the node up from scratch. Look for the following debug lines in network.log:

[comms::dht::network_discovery::seed_strap] [Thread:55429085] DEBUG Attempting to discover peers via seed nodes. // comms/dht/src/network_discovery/seed_strap.rs:61

And then:

[comms::dht::network_discovery::seed_strap] [Thread:55429095] INFO Added 992 peers via seed nodes. Transitioning to Ready state. // comms/dht/src/network_discovery/seed_strap.rs:71

Breaking Changes

None

Summary by CodeRabbit

New Features
- Introduced a new seed bootstrap phase for peer discovery, improving network initialization and robustness.
- Enhanced UI to display detailed bootstrap progress, including current and total bootstrap rounds.
- Added configuration options for controlling bootstrap and seed peer synchronization behavior.
- Added new events signaling bootstrap method determination and primary bootstrap completion.
Improvements
- More accurate and explicit tracking of bootstrap completion and peer synchronization status.
- Improved event handling and state transitions during node startup and synchronization, integrating DHT events.
- Enhanced logging and diagnostics for bootstrap and peer discovery processes.
- Refined sync state transitions based on peer metadata and bootstrap completion.
- Updated peer flags to explicitly mark seed peers.
- Replaced peer manager event streams with DHT event subscriptions for state machine event handling.
- Expanded public API to expose additional network discovery types and modules.
- Improved test stability and coverage by updating imports and event sources.
- Added a bootstrap timeout mechanism to prevent UI deadlocks during prolonged bootstrap.
- Enhanced network discovery state machine with explicit bootstrap method tracking and event publishing.
- Improved peer discovery logic with early exit conditions and detailed peer categorization.
- Refactored discovery ready state to better handle discovery rounds and transitions.
Bug Fixes
- Resolved issues where missed bootstrap events could cause UI state inconsistencies or deadlocks.
- Suppressed unnecessary warnings in mempool validation and Monero extra field deserialization.

…into bootstrapper-dev

coderabbitai · 2025-05-28T13:13:46Z

Walkthrough

This update introduces a new explicit seed peer bootstrap phase ("SeedStrap") into the DHT network discovery process, with detailed event signaling, configuration, and state tracking. The base node state machine is refactored to subscribe to and respond to DHT bootstrap events, updating UI state accordingly. Multiple configuration and event types are extended, and peer flag handling is improved.

Changes

File(s)	Change Summary
`base_layer/core/src/base_node/state_machine_service/initializer.rs`, `state_machine.rs`, `states/events_and_states.rs`, `states/listening.rs`, `states/starting_state.rs`	Refactored state machine to use DHT events for bootstrap tracking, replaced PeerManager with DHT event stream, enhanced bootstrap phase UI state, and added bootstrap completion logic.
`base_layer/core/src/base_node/sync/config.rs`	Added `num_initial_sync_rounds_seed_bootstrap` config field with accessor and default.
`base_layer/core/tests/helpers/nodes.rs`	Added `dht` field to `NodeInterfaces` and initialized it from service handles.
`base_layer/core/tests/helpers/sync.rs`, `tests/tests/base_node_rpc.rs`, `tests/tests/mempool.rs`, `tests/tests/node_service.rs`, `tests/tests/node_state_machine.rs`	Updated imports and constructor arguments to use DHT event subscription instead of PeerManager events; adjusted `ListeningInfo` import paths.
`base_layer/p2p/src/peer_seeds.rs`	Updated peer creation to set `PeerFlags::SEED` for seed peers.
`comms/dht/src/event.rs`	Added `PrimaryBootstrapComplete` and `BootstrapMethodDetermined(BootstrapMethod)` to `DhtEvent`.
`comms/dht/src/lib.rs`	Publicly re-exported `BootstrapMethod`, `DiscoveryPhase`, and `NetworkDiscoveryConfig`.
`comms/dht/src/network_discovery/config.rs`	Added seed bootstrap and sync-related config fields with defaults and serde support.
`comms/dht/src/network_discovery/discovering.rs`, `on_connect.rs`	Integrated `DiscoveryPhase` into discovery logic and event publishing.
`comms/dht/src/network_discovery/error.rs`	Implemented custom `PartialEq` for `NetworkDiscoveryError` (variant-only).
`comms/dht/src/network_discovery/initializing.rs`	Enhanced initialization: queries peer manager, categorizes peers, conditionally skips bootstrap if enough peers exist.
`comms/dht/src/network_discovery/mod.rs`	Made `seed_strap` and `state_machine` modules public.
`comms/dht/src/network_discovery/ready.rs`	Refactored peer selection logic, clarified discovery state transitions, added helper for peer selection.
`comms/dht/src/network_discovery/seed_strap.rs`	New: Implements `SeedStrap` for seed peer bootstrap, querying seeds, adding peers, and publishing detailed round info.
`comms/dht/src/network_discovery/state_machine.rs`	Major: Added `SeedStrap` phase, `BootstrapMethod`, `DiscoveryPhase`, event signaling, bootstrap timeout, and completion tracking. Refactored state machine logic for robust bootstrap handling.
`base_layer/core/src/base_node/state_machine_service/states/mod.rs`	Made `events_and_states` module public.
`base_layer/core/src/mempool/mempool_storage.rs`	Removed warning logs for specific validation errors (`UnknownInputs`, `MaturityError`).
`base_layer/core/src/proof_of_work/monero_rx/helpers.rs`	Lowered log level from warn to trace for Monero extra field deserialization errors.
`base_layer/core/tests/tests/block_validation.rs`	Added `#[serial]` attribute and ensured consistent network environment for `test_monero_blocks`.
`comms/dht/src/rpc/mock.rs`	Removed explicit `new()` constructor, relying on derived `Default` for `DhtRpcServiceMock`.
`comms/dht/src/rpc/mod.rs`	Removed re-export of `DhtRpcServiceMock` in test configuration.

Sequence Diagram(s)

sequenceDiagram
    participant Node as BaseNodeStateMachine
    participant DHT as Dht Service
    participant UI as UI State
    participant PeerMgr as PeerManager
    participant Seeds as Seed Peers

    Node->>DHT: Subscribe to DHT events
    DHT-->>Node: BootstrapMethodDetermined / PrimaryBootstrapComplete

    alt Bootstrap via SeedStrap
        Node->>DHT: Initiate SeedStrap bootstrap
        DHT->>Seeds: Query for peers
        Seeds-->>DHT: Respond with peer lists
        DHT-->>Node: NetworkDiscoveryPeersAdded (with round info)
        Node->>UI: Update bootstrap phase (progress/rounds)
        DHT-->>Node: PrimaryBootstrapComplete
        Node->>UI: Clear bootstrap phase, mark as complete
    else Bootstrap via ExistingPeers
        DHT-->>Node: BootstrapMethodDetermined(ExistingPeers)
        Node->>UI: Mark bootstrap complete, skip SeedStrap
    end

    Node->>DHT: Listen for further DHT events
    DHT-->>Node: Other DHT events (ignored or handled as needed)

Suggested reviewers

hansieodendaal

Poem

🥕
A seedling’s journey, round by round,
Bootstrap whispers through DHT abound.
Peers discovered, flags anew,
The state machine hops to something true.
With every event, a carrot cheer—
Bootstrap complete, the path is clear!
🐇

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2025-05-28T13:18:10Z

Test Results (Integration tests)

0 tests 0 ✅ 0s ⏱️
0 suites 0 💤
2 files 0 ❌
2 errors

For more details on these parsing errors, see this check.

Results for commit df24c02.

♻️ This comment has been updated with latest results.

github-actions · 2025-05-28T13:21:56Z

Test Results (CI)

3 files 112 suites 42m 43s ⏱️
1 328 tests 1 325 ✅ 0 💤 3 ❌
3 311 runs 3 308 ✅ 0 💤 3 ❌

For more details on these failures, see this check.

Results for commit 7b5c0d5.

♻️ This comment has been updated with latest results.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (7)

base_layer/core/src/base_node/sync/config.rs (1)

59-64: Consider enforcing configuration consistency with DhtConfig.

The comment indicates this value should ideally match DhtConfig.network_discovery.max_seed_peer_sync_count, but there's no mechanism to ensure consistency. This could lead to configuration mismatches where the base node and DHT have different expectations about bootstrap rounds.

Consider adding validation during configuration loading or documenting this relationship more prominently in the configuration files to help users maintain consistency.
comms/dht/src/network_discovery/initializing.rs (1)
64-100: Excellent peer categorization implementation!

The code provides comprehensive peer analysis with clear categorization logic. The error handling for database failures is appropriate - logging the error while continuing with bootstrap is the right approach.

Consider extracting the peer suitability check into a helper method for better readability:
+    fn is_suitable_peer(peer: &Peer) -> bool {
+        !peer.is_seed() && 
+        !peer.is_banned() && 
+        peer.deleted_at.is_none() && 
+        !peer.is_offline() && 
+        !peer.all_addresses_failed() && 
+        peer.features == PeerFeatures::COMMUNICATION_NODE
+    }

     for peer in &all_peers {
         total_peers += 1;

         if peer.is_seed() {
             seed_peers += 1;
         } else if peer.is_banned() {
             banned_peers += 1;
         } else if peer.deleted_at.is_some() {
             deleted_peers += 1;
         } else if peer.is_offline() {
             offline_peers += 1;
         } else if peer.all_addresses_failed() {
             failed_address_peers += 1;
         } else if peer.features != PeerFeatures::COMMUNICATION_NODE {
             non_communication_node_peers += 1;
         } else {
             suitable_peers += 1;
         }
     }
comms/dht/src/network_discovery/ready.rs (1)
235-241: Consider adding debug logging for the edge case.

The comment mentions a scenario where last_round_info_option is None but current_num_rounds > 0, which "should not happen if SeedStrap always sets last_round_info". While the code handles this gracefully, consider adding a debug log to detect if this edge case ever occurs in practice.
 }
 // Fallthrough: continue discovery if:
 // - last_round_info_option is None (but current_num_rounds > 0 - should not happen if SeedStrap always sets
 //   last_round_info, this path is more for re-entry from Idle/OnConnect where last_round might be old/cleared)
 //   OR
 // - last_round_info_option showed new peers or failed (and the new SeedStrap success condition above wasn't
 //   met), AND
 // - idle_after_num_rounds not yet reached.
+
+// Debug log for edge case detection
+if last_round_info_option.is_none() && current_num_rounds > 0 {
+    debug!(
+        target: LOG_TARGET,
+        "Unexpected state: last_round_info is None but current_num_rounds = {}. This might indicate a state inconsistency.",
+        current_num_rounds
+    );
+}
+
 let excluded_peers = self.context.all_attempted_peers.read().await.clone();
comms/dht/src/network_discovery/seed_strap.rs (3)
478-496: Early exit optimization looks good, but round number update may be misleading.

The early exit condition when sufficient peers are found is a good optimization. However, setting round_info.round_number = round_info.total_rounds on line 494 might be misleading since not all seed peers were actually contacted.

Consider adding a flag to indicate early exit instead of manipulating the round number.
 // If we early exit because we found enough peers, make the round_number reflect completion
 // of the seed strap phase.
-round_info.round_number = round_info.total_rounds;
+// Keep the actual round number and add a comment in the log
+info!(
+    target: LOG_TARGET,
+    "SeedStrap: Early exit after {}/{} seed contacts due to finding sufficient peers",
+    round_info.round_number.unwrap_or(idx + 1),
+    round_info.total_rounds.unwrap_or(num_seeds_to_try)
+);
457-464: Counter management after banning could be more robust.

While the current implementation checks for > 0 before decrementing, managing counters after the fact could lead to inconsistencies. Consider tracking whether this seed was counted as successful before incrementing initially.
 // If we banned the seed, we don't count it as a successful sync
+// Only decrement if we had previously counted this as successful
+let was_counted_as_successful = round_info.num_succeeded > successful_seed_contacts.saturating_sub(1);
 if round_info.num_succeeded > 0 {
     round_info.num_succeeded -= 1;
 }
 if successful_seed_contacts > 0 {
     successful_seed_contacts -= 1;
 }
+if !was_counted_as_successful {
+    warn!(target: LOG_TARGET, "Counter mismatch detected during seed ban handling");
+}
44-45: Consider reducing the stream item timeout.

The current 10-second timeout for individual stream items might be too conservative and could slow down the bootstrap process when dealing with unresponsive seeds.
 // Define a timeout for individual stream items
-const STREAM_ITEM_TIMEOUT: Duration = Duration::from_secs(10);
+const STREAM_ITEM_TIMEOUT: Duration = Duration::from_secs(5);
Also applies to: 691-691
comms/dht/src/network_discovery/state_machine.rs (1)
432-448: Consider logging when bootstrap timeout monitoring begins.

The timeout mechanism is well-implemented, but it would be helpful to log when the bootstrap timeout monitoring starts for better observability.

Add a log statement before the timeout monitoring begins:
 let next_event = if !bootstrap_completed {
+    debug!(
+        target: LOG_TARGET,
+        "Bootstrap timeout monitoring active (timeout: {:?})",
+        bootstrap_timeout_duration
+    );
     // Create a separate context to avoid borrow issues
     let context_clone = self.context.clone();

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1382008 and 7cd8768.

📒 Files selected for processing (18)

base_layer/core/src/base_node/state_machine_service/initializer.rs (2 hunks)
base_layer/core/src/base_node/state_machine_service/state_machine.rs (4 hunks)
base_layer/core/src/base_node/state_machine_service/states/events_and_states.rs (2 hunks)
base_layer/core/src/base_node/state_machine_service/states/listening.rs (5 hunks)
base_layer/core/src/base_node/state_machine_service/states/starting_state.rs (2 hunks)
base_layer/core/src/base_node/sync/config.rs (2 hunks)
base_layer/p2p/src/peer_seeds.rs (2 hunks)
comms/dht/src/event.rs (2 hunks)
comms/dht/src/lib.rs (1 hunks)
comms/dht/src/network_discovery/config.rs (2 hunks)
comms/dht/src/network_discovery/discovering.rs (3 hunks)
comms/dht/src/network_discovery/error.rs (1 hunks)
comms/dht/src/network_discovery/initializing.rs (3 hunks)
comms/dht/src/network_discovery/mod.rs (1 hunks)
comms/dht/src/network_discovery/on_connect.rs (2 hunks)
comms/dht/src/network_discovery/ready.rs (2 hunks)
comms/dht/src/network_discovery/seed_strap.rs (1 hunks)
comms/dht/src/network_discovery/state_machine.rs (16 hunks)

🧰 Additional context used

🧠 Learnings (1)

comms/dht/src/network_discovery/initializing.rs (2)

Learnt from: SWvheerden
PR: tari-project/tari#6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.

Learnt from: SWvheerden
PR: tari-project/tari#6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.

🧬 Code Graph Analysis (1)

comms/dht/src/network_discovery/error.rs (6)

comms/dht/src/network_discovery/state_machine.rs (1)

eq (144-158)

comms/core/src/net_address/multiaddr_with_stats.rs (2)

eq (370-372)

eq (443-464)

comms/core/src/net_address/mutliaddresses_with_stats.rs (1)

eq (271-273)

comms/core/src/peer_manager/peer.rs (1)

eq (378-380)

comms/dht/src/envelope.rs (3)

eq (164-172)

eq (319-321)

eq (325-327)

comms/core/src/connection_manager/peer_connection.rs (1)

eq (363-365)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: test (mainnet, stagenet)
GitHub Check: test (testnet, esmeralda)
GitHub Check: ci

🔇 Additional comments (33)

base_layer/p2p/src/peer_seeds.rs (2)

36-36: LGTM: Required import for seed peer flagging.

The addition of PeerFlags import supports the seed peer flagging implemented below.

169-169: LGTM: Correctly flags seed peers for bootstrap process.

This change properly marks peers converted from SeedPeer with the SEED flag, enabling the new bootstrap system to distinguish seed nodes from regular peers. This aligns perfectly with the PR objectives to implement a dedicated seed bootstrap phase.

comms/dht/src/network_discovery/on_connect.rs (2)

33-33: LGTM: Required import for discovery phase tracking.

The DiscoveryPhase import is necessary for the enhanced event reporting implemented below.

180-182: LGTM: Proper integration with discovery phase tracking.

The addition of these fields correctly integrates on-connect peer synchronization with the new discovery phase system:

DiscoveryPhase::General appropriately categorizes this as regular discovery (not seed bootstrap)

Setting round_number and total_rounds to None is correct since on-connect mode doesn't track specific rounds

The inline comments clearly explain the reasoning

This ensures consistent event reporting across different discovery modes.

comms/dht/src/lib.rs (1)

95-98: LGTM: Proper API exposure for bootstrap system integration.

The expanded re-exports correctly expose the new BootstrapMethod and DiscoveryPhase types from the state machine, enabling external components (such as the base node state machine service) to integrate with the enhanced bootstrap system. This maintains a clean public API while providing access to essential bootstrap-related types.

comms/dht/src/network_discovery/discovering.rs (3)

35-41: LGTM! Import addition supports discovery phase tracking.

The addition of DiscoveryPhase import is necessary for the new discovery phase tracking functionality implemented in the initialize method.

86-89: Good addition of discovery phase context.

The initialization of discovery phase and round information properly categorizes this as General discovery (as opposed to seed bootstrap) and provides necessary context for tracking discovery progress. Setting total_rounds to None correctly indicates that general discovery doesn't have a fixed number of rounds.

280-280: Improved comment clarity.

The comment clarification from "Other errors" to "Other errors - no banning needed" better explains the intent of this match arm.
base_layer/core/src/base_node/state_machine_service/initializer.rs (2)

24-25: Import refactoring aligns with DHT event migration.

The removal of PeerManager import and addition of Dht import correctly reflects the architectural change from peer manager events to DHT events for bootstrap tracking.

110-116:
Details

✅ Verification successful

Verify BaseNodeStateMachine constructor signature compatibility.

The change from peer manager to DHT event stream is architecturally sound and aligns with the new bootstrap tracking mechanism. However, ensure that the BaseNodeStateMachine constructor signature has been updated to accept DhtEventReceiver instead of the previous peer manager parameter.

🏁 Script executed:
#!/bin/bash
# Description: Verify BaseNodeStateMachine constructor signature matches the new call
# Expected: Constructor should accept DhtEventReceiver parameter

ast-grep --pattern $'impl BaseNodeStateMachine$_$ {
  $$$
  pub fn new($$$) -> $_ {
    $$$
  }
  $$$
}'
Length of output: 105

🏁 Script executed:
#!/bin/bash
# Description: Locate BaseNodeStateMachine implementation and inspect its new() signature
rg --color=never -n "impl BaseNodeStateMachine" -A20
Length of output: 54

🏁 Script executed:
#!/bin/bash
# Description: Search for BaseNodeStateMachine::new to locate its constructor signature
rg --color=never -n "BaseNodeStateMachine::new" -C5
Length of output: 6843

🏁 Script executed:
#!/bin/bash
# Find BaseNodeStateMachine struct definition and inspect its new() signature
rg --color=never -n "struct BaseNodeStateMachine" -C5 base_layer/core/src/base_node
rg --color=never -n "pub fn new" -C5 base_layer/core/src/base_node
Length of output: 32337
Constructor signature is correct
Verified that BaseNodeStateMachine::new in state_machine.rs now takes

metadata_event_stream: broadcast::Receiver<Arc<ChainMetadataEvent>>

dht_event_stream: DhtEventReceiver
in that order, and the initializer passes chain_metadata_service.get_event_stream() and dht.subscribe_dht_events() accordingly.
comms/dht/src/event.rs (2)

27-27: Import addition supports new bootstrap event.

The BootstrapMethod import is necessary for the new BootstrapMethodDetermined event variant.

42-46: Excellent addition of bootstrap tracking events.

The new events provide granular visibility into the bootstrap process:

PrimaryBootstrapComplete signals when the initial bootstrap phase (e.g., via seed nodes) is finished

BootstrapMethodDetermined(BootstrapMethod) communicates which bootstrap method is being used

These events enable the base node state machine to accurately track and respond to bootstrap progress, improving the user experience with more precise status updates.

comms/dht/src/network_discovery/mod.rs (1)

41-43: Appropriate visibility change for new bootstrap functionality.

Making seed_strap and state_machine modules public is necessary to enable external access to the new bootstrap types and functionality, such as:

BootstrapMethod enum used in DHT events

SeedStrap implementation for seed peer bootstrap

Enhanced state machine types used by the base node service

This change properly exposes the new bootstrap infrastructure to consumers.

base_layer/core/src/base_node/sync/config.rs (1)

81-86: LGTM!

The accessor method follows Rust conventions and provides clean access to the configuration value.

comms/dht/src/network_discovery/initializing.rs (2)

51-51: Good error handling improvement!

The enhanced error logging provides clear visibility into connectivity issues that prevent discovery from proceeding.

102-128: Excellent bootstrap decision implementation!

The logic correctly implements the PR objective by skipping seed bootstrap when sufficient suitable peers exist. The detailed logging provides excellent visibility for debugging and monitoring the bootstrap decision process.

comms/dht/src/network_discovery/config.rs (1)

56-103: Well-designed configuration for seed bootstrap control!

The new configuration fields provide comprehensive control over the seed bootstrap process with clear documentation and sensible defaults. The use of #[serde(default)] ensures backward compatibility.

Note that max_seed_peer_sync_count (default: 5) matches the default value of num_initial_sync_rounds_seed_bootstrap in BlockchainSyncConfig, maintaining the consistency mentioned in the code comments.

base_layer/core/src/base_node/state_machine_service/states/events_and_states.rs (2)

181-186: Well-structured bootstrap phase tracking

The BootstrapPhaseInfo struct provides a clean way to track bootstrap progress with current and total rounds.

261-276: Good UX decision for bootstrap status display

Prioritizing the bootstrap phase display when active provides better visibility into the node's startup progress. The fallback to existing display logic maintains backward compatibility.

base_layer/core/src/base_node/state_machine_service/states/starting_state.rs (1)

53-95: Good approach to catch up on missed events

The initial loop to process any existing DHT events ensures the node doesn't miss bootstrap completion events that occurred before the state machine started listening.

base_layer/core/src/base_node/state_machine_service/state_machine.rs (1)

115-115: Clean refactoring from peer_manager to DHT events

The replacement of peer_manager with dht_event_stream is well-executed, maintaining the same event-driven architecture while improving the separation of concerns.

Also applies to: 129-129

comms/dht/src/network_discovery/ready.rs (2)

40-84: LGTM! Well-structured peer selection logic.

The new helper function effectively centralizes peer selection strategy, choosing between closest peers (when new peers were discovered) and random peers (otherwise). This is a good optimization for discovery efficiency.

196-210: Good optimization for successful SeedStrap completion.

The logic correctly identifies when the initial SeedStrap phase was very successful and transitions to less aggressive discovery mode. The multi-condition check ensures both quantity thresholds are met before making this optimization.
base_layer/core/src/base_node/state_machine_service/states/listening.rs (3)

168-210: Good defensive handling of missed DHT bootstrap events.

The logic to catch up on missed events before setting up the UI state is a good approach to handle timing issues. The error handling for all try_recv cases is comprehensive.

411-422: Good defensive check for bootstrap completion.

The logic to mark bootstrap complete when the final SeedStrap round reports success is a good fallback mechanism in case the PrimaryBootstrapComplete event is missed. This helps ensure robust bootstrap tracking.

254-265:
Details

❓ Verification inconclusive

Potential race condition in bootstrap phase management.

The comments indicate awareness of a potential race condition where bootstrap_phase might become None prematurely. While defensive checks are in place, the complexity of managing this state across multiple event handlers could lead to inconsistencies.

Consider consolidating the bootstrap state management logic to reduce the risk of race conditions.

Run the following script to check for other places where bootstrap_phase is modified:

Also applies to: 316-340

🏁 Script executed:
#!/bin/bash
# Description: Find all locations where bootstrap_phase is modified to ensure consistent state management

# Search for bootstrap_phase assignments
rg -A 3 "bootstrap_phase\s*=" 

# Search for set_primary_bootstrap_complete calls
rg -A 2 "set_primary_bootstrap_complete"
Length of output: 7409
Potential inconsistency in bootstrap_phase handling across multiple modules

I found that bootstrap_phase is being set or cleared in several places, increasing the risk of a race condition:

• state_machine/core/src/base_node/state_machine_service/state_machine.rs
– Clears info.bootstrap_phase = None and immediately re-sets state.
• states/starting_state.rs
– Calls shared.set_primary_bootstrap_complete(true) in multiple match arms.
• states/listening.rs
– Initializes, clears, and updates bootstrap_phase in several code paths and also calls shared.set_primary_bootstrap_complete(true).

Each location manages part of the “bootstrap complete” handshake, but none enforce atomicity between clearing the phase and flipping is_primary_bootstrap_complete. I recommend consolidating all bootstrap-phase transitions into a single helper (e.g. Shared::complete_bootstrap()) that both clears the phase and sets the complete flag together, to eliminate timing gaps.

Please review these three areas and ensure the state transitions cannot interleave incorrectly.
comms/dht/src/network_discovery/seed_strap.rs (1)

54-125: Well-structured seed bootstrap implementation.

The next_event method and overall error handling approach is excellent. The code properly handles both success and failure cases, always returning a DiscoveryComplete event with appropriate statistics. The extensive logging will be valuable for debugging bootstrap issues.

comms/dht/src/network_discovery/state_machine.rs (6)

59-74: LGTM! Well-structured enum for tracking bootstrap methods.

The BootstrapMethod enum clearly represents the different bootstrap scenarios with appropriate Display implementation.

143-159: Consider the implications of partial equality comparison.

The current PartialEq implementation only compares variant types for complex variants, ignoring their data. This means BeginDiscovery(params1) == BeginDiscovery(params2) regardless of the parameters, which could lead to unexpected behavior if equality checks are used elsewhere in the codebase.

Consider either:

Deriving PartialEq if all nested types support it, or

Document this behavior clearly if it's intentional

198-229: Excellent bootstrap lifecycle management!

The async methods for tracking bootstrap method, start time, and completion provide clear lifecycle management with appropriate logging at each stage.

231-259: Great improvement to event publishing with detailed logging!

The enhanced logging with receiver counts and specific event names will significantly help with debugging event flow issues.

324-406: Well-designed state transitions with robust error handling!

The state machine properly handles:

Bootstrap skipping when sufficient peers exist

Error recovery during SeedStrap to prevent UI deadlock

Clear success/failure paths with appropriate logging

The defensive programming approach of completing bootstrap even on errors is excellent for preventing stuck states.

500-524: Well-structured discovery phase tracking!

The DiscoveryPhase enum and enhanced DhtNetworkDiscoveryRoundInfo provide clear phase identification and progress tracking, which will be valuable for monitoring the bootstrap process.

comms/dht/src/network_discovery/error.rs

base_layer/core/src/base_node/state_machine_service/states/starting_state.rs

base_layer/core/src/base_node/state_machine_service/state_machine.rs

comms/dht/src/network_discovery/seed_strap.rs

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

comms/dht/src/network_discovery/seed_strap.rs (1)

663-770: ⚠️ Potential issue

Address the unresolved security concern about excessive peer data.

The past review comment about protecting against excessive peer data from malicious seeds has not been addressed. The collect_peer_stream function still lacks a limit on the total number of peers collected from a single seed node, which could lead to memory exhaustion from malicious seeds.

Please implement the previously suggested protection:

 async fn collect_peer_stream<S>(
     &self,
     seed_node_id_str: &str,
     peer_stream: &mut S,
 ) -> Result<Vec<crate::proto::rpc::PeerInfo>, NetworkDiscoveryError>
 where
     S: StreamExt<Item = Result<crate::proto::rpc::GetPeersResponse, tari_comms::protocol::rpc::RpcStatus>> + Unpin,
 {
     let mut peers_from_seed = Vec::new();
     let mut stream_items_processed_total = 0;
     let mut stream_items_with_peers = 0;
+    // Limit total peers from a single seed to prevent memory exhaustion
+    const MAX_PEERS_PER_SEED: usize = 1000;

     debug!(
         target: LOG_TARGET,
         "SeedStrap: Beginning to collect peer stream items from seed '{}'", seed_node_id_str
     );

     loop {
+        // Check if we've collected enough peers from this seed
+        if peers_from_seed.len() >= MAX_PEERS_PER_SEED {
+            warn!(
+                target: LOG_TARGET,
+                "SeedStrap: Reached maximum peers limit ({}) for seed '{}'. Stopping collection.",
+                MAX_PEERS_PER_SEED,
+                seed_node_id_str
+            );
+            break;
+        }
+
         debug!(
             target: LOG_TARGET,
             "SeedStrap: Attempting to get next peer from stream for seed '{}'. Processed {} items so far ({} with peers).",

🧹 Nitpick comments (1)

comms/dht/src/network_discovery/state_machine.rs (1)
143-159: Consider documenting the PartialEq behavior

The PartialEq implementation only compares variant types for complex data structures, which is reasonable for state machine logic but might be unexpected.

Consider adding a comment to clarify the comparison behavior:
+/// PartialEq implementation that compares only variant types for complex data,
+/// useful for state machine transition logic where exact data equality isn't needed
 impl PartialEq for StateEvent {

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cd8768 and a8d826d.

📒 Files selected for processing (13)

base_layer/core/src/base_node/state_machine_service/states/listening.rs (5 hunks)
base_layer/core/src/base_node/state_machine_service/states/mod.rs (1 hunks)
base_layer/core/tests/helpers/nodes.rs (3 hunks)
base_layer/core/tests/helpers/sync.rs (1 hunks)
base_layer/core/tests/tests/base_node_rpc.rs (1 hunks)
base_layer/core/tests/tests/mempool.rs (1 hunks)
base_layer/core/tests/tests/node_service.rs (1 hunks)
base_layer/core/tests/tests/node_state_machine.rs (3 hunks)
comms/dht/src/network_discovery/error.rs (1 hunks)
comms/dht/src/network_discovery/ready.rs (2 hunks)
comms/dht/src/network_discovery/seed_strap.rs (1 hunks)
comms/dht/src/network_discovery/state_machine.rs (15 hunks)
comms/dht/src/network_discovery/test.rs (3 hunks)

✅ Files skipped from review due to trivial changes (3)

base_layer/core/tests/tests/node_service.rs
base_layer/core/tests/tests/base_node_rpc.rs
base_layer/core/tests/tests/mempool.rs

🚧 Files skipped from review as they are similar to previous changes (1)

comms/dht/src/network_discovery/error.rs

🧰 Additional context used

🧬 Code Graph Analysis (1)

comms/dht/src/network_discovery/state_machine.rs (6)

comms/dht/src/network_discovery/ready.rs (3)

new (87-92)

next_event (94-104)

config (272-274)

comms/dht/src/network_discovery/seed_strap.rs (3)

new (55-57)

next_event (59-125)

config (773-775)

comms/dht/src/network_discovery/initializing.rs (2)

new (38-40)

next_event (42-128)

comms/dht/src/network_discovery/discovering.rs (4)

new (64-71)

next_event (114-151)

config (311-313)

params (316-325)

comms/dht/src/network_discovery/on_connect.rs (3)

new (53-58)

next_event (60-122)

config (200-202)

comms/dht/src/network_discovery/config.rs (1)

default (90-105)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: test (mainnet, stagenet)
GitHub Check: test (testnet, esmeralda)
GitHub Check: test (nextnet, nextnet)
GitHub Check: cargo check with stable
GitHub Check: Cucumber tests / Base Layer
GitHub Check: Cucumber tests / FFI
GitHub Check: ci

🔇 Additional comments (33)

base_layer/core/tests/tests/node_state_machine.rs (3)

109-109: LGTM: Consistent migration to DHT event streams.

The change from peer manager event stream to DHT event subscription aligns perfectly with the PR's bootstrap process refactoring. This ensures tests properly validate the new DHT-based event handling model.

252-252: LGTM: Consistent test pattern maintained.

The migration to dht.subscribe_dht_events() maintains consistency with the other test functions and supports the new bootstrap event handling architecture.

302-302: LGTM: Test infrastructure properly updated.

The change ensures that the test_event_channel test uses the same DHT event stream pattern as the other tests, maintaining consistency across the test suite.

base_layer/core/src/base_node/state_machine_service/states/mod.rs (1)

59-59: LGTM: Appropriate visibility change for module restructuring.

Making the events_and_states module public is necessary to support the bootstrap process refactoring and enables external access to the enhanced ListeningInfo struct and related types that track bootstrap progress.

base_layer/core/tests/helpers/nodes.rs (2)

89-89: LGTM: DHT handle integration supports test infrastructure.

Adding the dht field to NodeInterfaces is essential for providing test access to the DHT event stream, which is now required for the new bootstrap process and BaseNodeStateMachine initialization.

408-422: LGTM: Proper DHT handle extraction and assignment.

The extraction of the DHT handle from service handles and assignment to the NodeInterfaces struct follows the established pattern for other service handles and enables test access to DHT functionality.

base_layer/core/tests/helpers/sync.rs (1)

188-188: LGTM: Consistent migration to DHT event streams in test helpers.

The change from peer manager to DHT event subscription maintains consistency with the bootstrap process refactoring seen across other test files and properly supports the new BaseNodeStateMachine event handling model.

comms/dht/src/network_discovery/test.rs (4)

173-173: LGTM: Proper import for new async synchronization primitives.

Good addition of the RwLock import to support the new bootstrap tracking fields.

176-182: LGTM: Well-organized imports for bootstrap functionality.

The imports are properly organized and support the new bootstrap method tracking introduced in the state machine.

209-210: LGTM: Appropriate test context updates for new bootstrap fields.

The addition of bootstrap_method and bootstrap_started_at fields properly mirrors the production context structure, enabling accurate testing of bootstrap-related functionality.

259-259: LGTM: Clean pattern for test field initialization.

Using ..Default::default() is a good practice for test setup, making the code more maintainable when new fields are added.

comms/dht/src/network_discovery/ready.rs (3)

40-84: LGTM: Excellent refactoring with improved separation of concerns.

The new select_peers_for_discovery_round helper function encapsulates peer selection logic cleanly and makes the code more testable and maintainable. The logic correctly differentiates between selecting closest peers when the last round found new peers versus random peers otherwise.

97-114: LGTM: Clean parameter addition and improved error handling.

The addition of the current_num_rounds parameter makes the method's dependencies explicit and improves testability. The error handling structure is appropriate.

116-269: LGTM: Well-structured scenario-based discovery logic.

The refactoring into three clear scenarios greatly improves code readability and maintainability:

Not enough peers overall - must discover

First active discovery round - forced discovery

Subsequent rounds - process last round results

The special handling for successful SeedStrap rounds (lines 197-211) is a thoughtful addition that optimizes the discovery process by transitioning to OnConnectMode early when sufficient peers are found during seed bootstrap.

comms/dht/src/network_discovery/seed_strap.rs (3)

59-125: LGTM: Well-structured main entry point with comprehensive error handling.

The next_event method provides a clean interface and handles both success and error cases appropriately. The detailed logging and round info tracking will be valuable for debugging and monitoring.

127-573: LGTM: Comprehensive seed discovery implementation with robust error handling.

The discover_peers_via_seeds method is well-implemented with:

Proper seed peer selection and randomization

Comprehensive error handling and logging

Malicious seed detection and banning

Early exit conditions based on configuration

Detailed progress tracking and statistics

The peer validation logic (lines 357-420) correctly handles different types of validation errors and appropriately bans seeds that provide invalid data.

575-661: LGTM: Proper RPC connection handling with appropriate timeouts.

The fetch_peers_from_connection method correctly establishes RPC connections and makes appropriate requests with reasonable peer count limits.

base_layer/core/src/base_node/state_machine_service/states/listening.rs (3)

33-33: LGTM: Appropriate imports for DHT bootstrap integration.

The new imports properly support the DHT event handling and discovery phase tracking functionality.

168-231: LGTM: Robust handling of missed DHT events during startup.

The logic to check for missed DHT bootstrap events addresses potential timing issues where events might be published before the listening state starts. The handling of BootstrapMethodDetermined and PrimaryBootstrapComplete events ensures consistent state regardless of event timing.

233-452: LGTM: Well-implemented concurrent event handling with proper bootstrap integration.

The refactored event loop using tokio::select! properly handles both chain metadata and DHT events concurrently. Key strengths:

Appropriate bootstrap completion checks before sync decisions

Correct UI state updates based on bootstrap phase progress

Proper handling of event stream errors and lagging

Clean separation between bootstrap and normal sync logic

The bootstrap phase tracking (lines 400-424) correctly updates the UI state based on SeedStrap progress and marks completion when appropriate.

comms/dht/src/network_discovery/state_machine.rs (13)

59-74: LGTM: Well-designed bootstrap method enum

The BootstrapMethod enum clearly represents the three bootstrap approaches with descriptive names and a proper Display implementation for logging purposes.

79-79: LGTM: Consistent SeedStrap state integration

The SeedStrap state is properly integrated into the State enum with correct display formatting and helper method is_seed_strap() for state checking.

Also applies to: 92-92, 107-109

115-115: LGTM: Clear event naming for bootstrap decision

The InitialPeersSufficient event clearly indicates when bootstrap can be skipped due to existing peers, with descriptive display text.

Also applies to: 131-131

177-179: LGTM: Appropriate context fields for bootstrap tracking

The new fields bootstrap_method and bootstrap_started_at provide necessary state tracking for the bootstrap process with proper Arc<RwLock<>> wrapping for concurrent access.

197-229: LGTM: Well-structured bootstrap management methods

The bootstrap management methods properly handle state updates and event publishing:

set_bootstrap_method updates state and publishes notification

mark_bootstrap_started records timing

complete_bootstrap calculates duration and publishes completion

The async/await usage and logging are appropriate.

231-259: LGTM: Comprehensive event publishing with detailed logging

The enhanced publish_event method provides excellent debugging capabilities by logging the specific event type, receiver count, and success/failure status. This will be valuable for troubleshooting bootstrap issues.

298-299: LGTM: Proper initialization of bootstrap tracking fields

The new context fields are correctly initialized with appropriate default values (None bootstrap method, no start time).

329-339: LGTM: Proper SeedStrap state transition with bootstrap tracking

The transition from Initializing to SeedStrap correctly:

Marks bootstrap as started

Sets bootstrap method

Creates the SeedStrap state

The alternative transition for InitialPeersSufficient appropriately skips to Ready state while completing bootstrap tracking.

340-359: LGTM: Robust SeedStrap completion handling

The SeedStrap completion logic properly:

Publishes peer discovery events when new peers are found

Handles both success and failure cases

Marks bootstrap complete on success

Transitions to Ready state appropriately

The failure handling with waiting period is a good defensive approach.

379-395: Excellent error handling to prevent UI deadlock

The error handling logic that marks bootstrap complete even during SeedStrap failures is a crucial defensive measure to prevent the UI from getting stuck waiting for bootstrap completion.

This demonstrates good understanding of system-wide interactions and failure modes.

425-451: Robust bootstrap timeout implementation

The bootstrap timeout logic is well-implemented:

Uses tokio::select! to race between normal operation and timeout

Clones context to avoid borrow checker issues

Forces completion on timeout to prevent indefinite waiting

Uses configured timeout duration

The approach of forcing bootstrap completion on timeout ensures the system doesn't hang indefinitely while waiting for seed nodes.

453-460: LGTM: Correct bootstrap completion detection

The bootstrap completion detection correctly identifies when bootstrap finishes:

DiscoveryComplete from SeedStrap state

InitialPeersSufficient for skipped bootstrap

This ensures the timeout logic is disabled once bootstrap is done.

498-516: LGTM: Clear discovery phase tracking

The DiscoveryPhase enum and enhanced DhtNetworkDiscoveryRoundInfo provide good visibility into the current discovery phase with appropriate fields for round tracking.

The Default implementation defaulting to General phase is sensible for ongoing discovery after bootstrap.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

base_layer/core/tests/tests/block_validation.rs (1)
98-108: Consider consolidating network variable declarations.

The network configuration logic is sound and ensures consistent test environment. However, there's a redundant variable declaration that could be improved.
-let network = Network::Esmeralda;
-if std::env::var("TARI_NETWORK").is_err() {
-    std::env::set_var("TARI_NETWORK", network.as_key_str());
-}
-if Network::get_current_or_user_setting_or_default() != network {
-    let _ = Network::set_current(network);
-}
-let current_network = Network::get_current_or_user_setting_or_default();
-if current_network != network {
-    panic!("could not set network");
-}
+const EXPECTED_NETWORK: Network = Network::Esmeralda;
+if std::env::var("TARI_NETWORK").is_err() {
+    std::env::set_var("TARI_NETWORK", EXPECTED_NETWORK.as_key_str());
+}
+if Network::get_current_or_user_setting_or_default() != EXPECTED_NETWORK {
+    let _ = Network::set_current(EXPECTED_NETWORK);
+}
+let current_network = Network::get_current_or_user_setting_or_default();
+if current_network != EXPECTED_NETWORK {
+    panic!("Failed to set network to {:?}, current network is {:?}", EXPECTED_NETWORK, current_network);
+}
And remove the duplicate declaration at line 114:
-let network = Network::Esmeralda;
+let network = EXPECTED_NETWORK;
This eliminates the duplicate variable declaration and provides a more descriptive panic message.
base_layer/core/src/proof_of_work/monero_rx/helpers.rs (1)

143-143: Appropriate log level reduction for expected error condition.

This change correctly reduces log noise by changing from warn! to trace! for Monero extra field deserialization failures. As noted in the code comments, these failures are expected behavior per Monero consensus rules and don't represent actual errors.

However, consider updating the similar logging in parse_extra_field_truncate_on_error (line 118) for consistency, as it handles the same type of error but still uses warn! level.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8d826d and 9b97555.

📒 Files selected for processing (7)

base_layer/core/src/base_node/state_machine_service/states/listening.rs (4 hunks)
base_layer/core/src/mempool/mempool_storage.rs (1 hunks)
base_layer/core/src/proof_of_work/monero_rx/helpers.rs (1 hunks)
base_layer/core/tests/tests/block_validation.rs (2 hunks)
base_layer/core/tests/tests/node_state_machine.rs (7 hunks)
comms/dht/src/network_discovery/test.rs (5 hunks)
comms/dht/src/rpc/mock.rs (0 hunks)

💤 Files with no reviewable changes (1)

comms/dht/src/rpc/mock.rs

🚧 Files skipped from review as they are similar to previous changes (2)

base_layer/core/tests/tests/node_state_machine.rs
base_layer/core/src/base_node/state_machine_service/states/listening.rs

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: test (mainnet, stagenet)
GitHub Check: Cucumber tests / Base Layer
GitHub Check: test (testnet, esmeralda)
GitHub Check: test (nextnet, nextnet)
GitHub Check: Cucumber tests / FFI
GitHub Check: cargo check with stable
GitHub Check: ci

🔇 Additional comments (7)

base_layer/core/src/mempool/mempool_storage.rs (1)

130-130: LGTM - Aligns with bootstrap improvements but consider debug logging alternative.

This change effectively reduces log noise during bootstrap/sync operations where maturity errors are expected for time-locked transactions. The selective suppression makes sense in the context of the improved bootstrap process.

However, completely removing logging for maturity errors might hinder debugging in unexpected scenarios. Consider adding a debug-level log statement to retain diagnostic capability while reducing noise:
-            Err(ValidationError::MaturityError) => Ok(TxStorageResponse::NotStoredTimeLocked),
+            Err(ValidationError::MaturityError) => {
+                debug!(target: LOG_TARGET, "Transaction not stored due to maturity error (time-locked)");
+                Ok(TxStorageResponse::NotStoredTimeLocked)
+            },
Note: The AI summary mentions that both UnknownInputs and MaturityError cases had warning logs removed, but only the MaturityError case shows changes in the annotated code.

Likely an incorrect or invalid review comment.
base_layer/core/tests/tests/block_validation.rs (2)

28-28: LGTM - Appropriate dependency for test serialization.

The serial_test::serial import is correctly added to support the #[serial] attribute for preventing test interference.

96-96: LGTM - Serial execution prevents environment variable conflicts.

The #[serial] attribute ensures this test runs sequentially, preventing race conditions when modifying global environment variables and network configuration.
comms/dht/src/network_discovery/test.rs (4)

109-119: LGTM! Import additions support the new bootstrap functionality.

The new imports are properly organized and directly support the bootstrap tracking features being added to the test context.

145-146: LGTM! Bootstrap tracking fields properly initialized.

The new fields bootstrap_method and bootstrap_started_at are correctly initialized with appropriate async-safe wrappers (Arc<RwLock<>>) for shared state tracking in the network discovery context.

195-195: Good improvement using struct update syntax.

Using ..Default::default() makes the code more maintainable and ensures all fields are properly initialized.

203-203:
Details

✅ Verification successful

Verify the state transition change aligns with the new bootstrap behavior.

The test function was renamed from it_transitions_to_on_connect to it_transitions_to_idle and now expects StateEvent::Idle instead of StateEvent::OnConnectMode. This suggests a significant change in the state machine behavior.

Please verify that this state transition change is intentional and correct by running the following script to check for related state machine modifications:

Also applies to: 218-218

🏁 Script executed:
#!/bin/bash
# Description: Verify state machine changes related to OnConnectMode and Idle states
# Expected: Find evidence of state machine updates that justify this test change

echo "=== Searching for OnConnectMode references ==="
rg -A 3 -B 3 "OnConnectMode"

echo "=== Searching for Idle state handling ==="
rg -A 3 -B 3 "StateEvent::Idle"

echo "=== Checking state machine files for bootstrap-related changes ==="
fd "state_machine" --type f | head -5 | xargs rg -l "bootstrap|Bootstrap"
Length of output: 11138
Verified: Idle transition aligns with updated bootstrap logic
The it_transitions_to_idle test correctly reflects the changes in comms/dht/src/network_discovery/ready.rs, which now returns StateEvent::Idle in the tested scenarios. The OnConnectMode branch still exists for its original conditions, so renaming the test and expecting StateEvent::Idle is intentional and accurate.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

base_layer/core/src/base_node/state_machine_service/states/listening.rs (1)

141-452: 🛠️ Refactor suggestion

Consider breaking down the next_event method for better maintainability.

This method has grown to over 300 lines, making it difficult to maintain and test. Consider extracting logical sections into separate methods.

Suggested refactoring approach:

impl Listening {
    pub async fn next_event<B: BlockchainBackend + 'static>(
        &mut self,
        shared: &mut BaseNodeStateMachine<B>,
        network_silence: bool,
    ) -> StateEvent {
        if network_silence {
            return self.handle_network_silence(shared);
        }

        self.setup_initial_bootstrap_state(shared).await;
        self.run_event_loop(shared).await
    }

    async fn setup_initial_bootstrap_state<B: BlockchainBackend + 'static>(
        &mut self,
        shared: &mut BaseNodeStateMachine<B>,
    ) {
        // Lines 156-231 logic here
    }

    async fn run_event_loop<B: BlockchainBackend + 'static>(
        &mut self,
        shared: &mut BaseNodeStateMachine<B>,
    ) -> StateEvent {
        // Lines 232-451 logic here
    }

    async fn handle_chain_metadata_event<B: BlockchainBackend + 'static>(
        &mut self,
        shared: &mut BaseNodeStateMachine<B>,
        metadata_event: &ChainMetadataEvent,
        // other parameters
    ) -> Option<StateEvent> {
        // Chain metadata handling logic
    }

    async fn handle_dht_event<B: BlockchainBackend + 'static>(
        &mut self,
        shared: &mut BaseNodeStateMachine<B>,
        dht_event: &DhtEvent,
    ) {
        // DHT event handling logic
    }
}

🧹 Nitpick comments (1)

base_layer/core/src/base_node/state_machine_service/states/listening.rs (1)
168-210: Consider optimizing the missed DHT events processing loop.

While the logic for handling missed DHT events is sound, the current implementation could be inefficient if many events need to be processed. Consider adding a limit to prevent excessive processing during startup.
 // Try to receive any recent DHT events that might have been published before we started listening
 let mut events_processed = 0;
+const MAX_MISSED_EVENTS_TO_PROCESS: usize = 100;
 loop {
+    if events_processed >= MAX_MISSED_EVENTS_TO_PROCESS {
+        warn!(target: LOG_TARGET, "[BN SM LISTENING] Reached maximum missed events limit ({}), stopping processing", MAX_MISSED_EVENTS_TO_PROCESS);
+        break;
+    }
     match dht_events_check.try_recv() {

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b97555 and 7b5c0d5.

📒 Files selected for processing (3)

base_layer/core/src/base_node/state_machine_service/states/listening.rs (5 hunks)
base_layer/core/tests/tests/node_state_machine.rs (7 hunks)
comms/dht/src/rpc/mod.rs (0 hunks)

💤 Files with no reviewable changes (1)

comms/dht/src/rpc/mod.rs

🚧 Files skipped from review as they are similar to previous changes (1)

base_layer/core/tests/tests/node_state_machine.rs

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: test (testnet, esmeralda)
GitHub Check: test (nextnet, nextnet)
GitHub Check: test (mainnet, stagenet)
GitHub Check: ci
GitHub Check: cargo check with stable

🔇 Additional comments (4)

base_layer/core/src/base_node/state_machine_service/states/listening.rs (4)

33-33: LGTM: Import additions are appropriate.

The new imports for DhtEvent and DiscoveryPhase align with the DHT bootstrap functionality being added.

401-424: Good fallback mechanism for bootstrap completion.

The logic to mark bootstrap as complete when SeedStrap reports completion (lines 414-421) provides a robust fallback in case the PrimaryBootstrapComplete event is missed. This is a good defensive programming practice.

317-320: Address the TODO comment about premature bootstrap_phase clearing.

The comment indicates a potential issue with clearing the bootstrap_phase too early. The current conditional check if shared.is_primary_bootstrap_complete addresses this concern well.

254-265:
Details

✅ Verification successful

Verify bootstrap completion logic consistency.

The logic that skips sync decisions while bootstrapping looks correct, but ensure this doesn't create edge cases where the node gets stuck in a bootstrap state.

🏁 Script executed:
#!/bin/bash
# Description: Check for potential bootstrap state management issues in related files

# Search for other places where is_primary_bootstrap_complete is used
rg -A 5 -B 5 "is_primary_bootstrap_complete"

# Check for bootstrap completion logic in DHT components
rg -A 5 -B 5 "PrimaryBootstrapComplete"

# Look for bootstrap phase state transitions
rg -A 5 -B 5 "bootstrap_phase"
Length of output: 48456
Bootstrap state management logic verified.
I reviewed all references to is_primary_bootstrap_complete, bootstrap_phase, and the handling of DhtEvent::PrimaryBootstrapComplete (and related peer‐sync events). The code unambiguously sets the flag on receipt of the primary bootstrap event (and on final SeedStrap rounds) and immediately clears the UI bootstrap phase. There are no pathways where the node could remain indefinitely stuck in bootstrapping.

base_layer/core/src/base_node/state_machine_service/states/listening.rs

fluffypony added 30 commits May 17, 2025 09:01

new bootstrap process: initial commit

220457d

fix coderabbit nitpicks

40895ae

more logging, and some tweaks to ensure we bootstrap properly

ddce4aa

forgot to increment num_rounds

642f460

even more logging

d442470

allow for an early exit

301481b

NEVER sync from seed nodes, not even headers

5fa295f

make sure we increment num_rounds everywhere in the state machine

5bee3f5

better handling of post-discovery states

6880dc4

bubble display up to console

14ad4d0

bubble data up to console + fix early exit conditions

78b8d72

fix coderabbit nitpicks

05dbcd2

don't launch seedstrap unecessarily

02fb693

fix console display when no longer bootstrapping

ff04e35

clean up, refactor, and fix DHT and base node state machine sync

be7215d

better logging at restart

fa87450

minor logging tweaks

83e5a8c

fix typo

ee94f45

fix base node state machine sync

319126c

state machine ordering

cb854d4

more state machine nigglies

d336a73

new bootstrap process: initial commit

94ad4cc

fix coderabbit nitpicks

1477b9c

more logging, and some tweaks to ensure we bootstrap properly

0ab6ece

forgot to increment num_rounds

c00f4a6

even more logging

fe2b228

allow for an early exit

a8d6274

NEVER sync from seed nodes, not even headers

d44743f

make sure we increment num_rounds everywhere in the state machine

466f8e9

better handling of post-discovery states

8e10845

fluffypony and others added 6 commits May 24, 2025 14:30

fix base node state machine sync

b1709b8

state machine ordering

696b0fa

more state machine nigglies

651cd98

Merge branch 'bootstrapper-dev' of https://github.com/fluffypony/tari …

a91556f

…into bootstrapper-dev

Merge branch 'development' into bootstrapper-dev

58494ff

fix clippy and merge conflicts

7cd8768

SWvheerden requested a review from a team as a code owner May 28, 2025 13:13

SWvheerden mentioned this pull request May 28, 2025

feat: new bootstrap process #7063

Closed

1 task

coderabbitai bot reviewed May 28, 2025

View reviewed changes

fix tests

a8d826d

coderabbitai bot reviewed May 28, 2025

View reviewed changes

SWvheerden added 2 commits May 29, 2025 13:13

Merge branch 'development' into bootstrapper-dev

df24c02

fix tests

9b97555

coderabbitai bot reviewed May 29, 2025

View reviewed changes

SWvheerden added 2 commits May 29, 2025 16:08

clippy

4b1dc09

more clippy

7b5c0d5

coderabbitai bot reviewed May 29, 2025

View reviewed changes

base_layer/core/src/base_node/state_machine_service/states/listening.rs Show resolved Hide resolved

SWvheerden merged commit e5a0854 into tari-project:development May 29, 2025
11 of 14 checks passed

SWvheerden deleted the bootstrapper-dev branch May 30, 2025 06:49

coderabbitai bot mentioned this pull request Jun 3, 2025

fix: the statemachine #7169

Merged

SWvheerden mentioned this pull request Jun 4, 2025

Bootstrap process should not be single threaded #7181

Closed

coderabbitai bot mentioned this pull request Jul 8, 2025

feat: add concurrency when contacting seed peers while performing seed strap #7294

Merged

4 tasks

This was referenced Jul 21, 2025

feat: add timeouts to all seedstrap comms calls #7331

Merged

feat: prevent sync peers sending local addresses #7359

Merged

This was referenced Aug 11, 2025

feat: add seed peer exclusion to the dht #7397

Closed

feat: add seed peer exclusion for header- and block sync #7394

Closed

coderabbitai bot mentioned this pull request Sep 10, 2025

fix: ambiguous seedstrap warning log #7483

Merged

4 tasks

Conversation

SWvheerden commented May 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

What process can a PR reviewer use to test or verify this change?

Breaking Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

github-actions bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results (Integration tests)

Uh oh!

github-actions bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results (CI)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SWvheerden commented May 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

github-actions bot commented May 28, 2025 •

edited

Loading

github-actions bot commented May 28, 2025 •

edited

Loading