Skip to content

Commit e5a0854

Browse files
feat: new bootstrap process (#7121)
Description --- Implement a better bootstrap process so we don't overload seed nodes Motivation and Context --- Implement the following bootstrap process: 1. Get the list of seed peers. 2. Connect to the seed peers (all of them) one at a time, and get THEIR list of peers, and then immediately disconnect. 3. Once the peer DHT is populated with a bunch of non-seed peers then we can actually connect to the network - but we also make sure the seed peers DO NOT form part of the DHT. Basically the seed nodes should never be connected to for very long, because otherwise their connection pool dries up. How Has This Been Tested? --- "It works on my machine" What process can a PR reviewer use to test or verify this change? --- Clear the whole ~/.tari/mainnet folder, and start the node up from scratch. Look for the following debug lines in network.log: `[comms::dht::network_discovery::seed_strap] [Thread:55429085] DEBUG Attempting to discover peers via seed nodes. // comms/dht/src/network_discovery/seed_strap.rs:61` And then: `[comms::dht::network_discovery::seed_strap] [Thread:55429095] INFO Added 992 peers via seed nodes. Transitioning to Ready state. // comms/dht/src/network_discovery/seed_strap.rs:71` Breaking Changes --- - [x] None <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a new seed bootstrap phase for peer discovery, improving network initialization and robustness. - Enhanced UI to display detailed bootstrap progress, including current and total bootstrap rounds. - Added configuration options for controlling bootstrap and seed peer synchronization behavior. - Added new events signaling bootstrap method determination and primary bootstrap completion. - **Improvements** - More accurate and explicit tracking of bootstrap completion and peer synchronization status. - Improved event handling and state transitions during node startup and synchronization, integrating DHT events. - Enhanced logging and diagnostics for bootstrap and peer discovery processes. - Refined sync state transitions based on peer metadata and bootstrap completion. - Updated peer flags to explicitly mark seed peers. - Replaced peer manager event streams with DHT event subscriptions for state machine event handling. - Expanded public API to expose additional network discovery types and modules. - Improved test stability and coverage by updating imports and event sources. - Added a bootstrap timeout mechanism to prevent UI deadlocks during prolonged bootstrap. - Enhanced network discovery state machine with explicit bootstrap method tracking and event publishing. - Improved peer discovery logic with early exit conditions and detailed peer categorization. - Refactored discovery ready state to better handle discovery rounds and transitions. - **Bug Fixes** - Resolved issues where missed bootstrap events could cause UI state inconsistencies or deadlocks. - Suppressed unnecessary warnings in mempool validation and Monero extra field deserialization. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Riccardo Spagni <ric@spagni.net>
1 parent 1e7ac28 commit e5a0854

File tree

31 files changed

+1867
-456
lines changed

31 files changed

+1867
-456
lines changed

base_layer/core/src/base_node/state_machine_service/initializer.rs

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,9 @@
2020
// WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
2121
// USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2222

23-
use std::sync::Arc;
24-
2523
use log::*;
26-
use tari_comms::{connectivity::ConnectivityRequester, PeerManager};
24+
use tari_comms::connectivity::ConnectivityRequester;
25+
use tari_comms_dht::Dht;
2726
use tari_service_framework::{async_trait, ServiceInitializationError, ServiceInitializer, ServiceInitializerContext};
2827
use tokio::sync::{broadcast, watch};
2928

@@ -104,17 +103,17 @@ where B: BlockchainBackend + 'static
104103
let chain_metadata_service = handles.expect_handle::<ChainMetadataHandle>();
105104
let node_local_interface = handles.expect_handle::<LocalNodeCommsInterface>();
106105
let connectivity = handles.expect_handle::<ConnectivityRequester>();
107-
let peer_manager = handles.expect_handle::<Arc<PeerManager>>();
108106

109107
let sync_validators =
110108
SyncValidators::full_consensus(rules.clone(), factories, bypass_range_proof_verification);
111109

110+
let dht = handles.expect_handle::<Dht>(); // Get Dht handle
112111
let node = BaseNodeStateMachine::new(
113112
db,
114113
node_local_interface,
115114
connectivity,
116-
peer_manager,
117115
chain_metadata_service.get_event_stream(),
116+
dht.subscribe_dht_events(), // Pass DhtEventReceiver
118117
config,
119118
sync_validators,
120119
status_event_sender,
@@ -123,7 +122,6 @@ where B: BlockchainBackend + 'static
123122
rules,
124123
handles.get_shutdown_signal(),
125124
);
126-
127125
node.run().await;
128126
info!(target: LOG_TARGET, "Base Node State Machine Service has shut down");
129127
});

base_layer/core/src/base_node/state_machine_service/state_machine.rs

Lines changed: 41 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@ use log::*;
2626
use randomx_rs::RandomXFlag;
2727
use serde::{Deserialize, Serialize};
2828
use tari_common::configuration::serializers;
29-
use tari_comms::{connectivity::ConnectivityRequester, PeerManager};
29+
use tari_comms::connectivity::ConnectivityRequester;
30+
use tari_comms_dht::event::DhtEventReceiver;
3031
use tari_shutdown::ShutdownSignal;
3132
use tokio::sync::{broadcast, watch};
3233

@@ -44,7 +45,6 @@ use crate::{
4445
consensus::ConsensusManager,
4546
proof_of_work::randomx_factory::RandomXFactory,
4647
};
47-
4848
const LOG_TARGET: &str = "c::bn::base_node";
4949

5050
/// Configuration for the BaseNodeStateMachine.
@@ -90,27 +90,29 @@ pub struct BaseNodeStateMachine<B: BlockchainBackend> {
9090
pub(super) db: AsyncBlockchainDb<B>,
9191
pub(super) local_node_interface: LocalNodeCommsInterface,
9292
pub(super) connectivity: ConnectivityRequester,
93-
pub(super) peer_manager: Arc<PeerManager>,
9493
pub(super) metadata_event_stream: broadcast::Receiver<Arc<ChainMetadataEvent>>,
94+
pub(super) dht_event_stream: DhtEventReceiver,
9595
pub(super) config: BaseNodeStateMachineConfig,
9696
pub(super) info: StateInfo,
9797
pub(super) sync_validators: SyncValidators<B>,
9898
pub(super) consensus_rules: ConsensusManager,
9999
pub(super) status_event_sender: Arc<watch::Sender<StatusInfo>>,
100100
pub(super) randomx_factory: RandomXFactory,
101101
is_bootstrapped: bool,
102+
pub(super) is_primary_bootstrap_complete: bool,
102103
event_publisher: broadcast::Sender<Arc<StateEvent>>,
103104
interrupt_signal: ShutdownSignal,
104105
}
105106

106107
impl<B: BlockchainBackend + 'static> BaseNodeStateMachine<B> {
107-
/// Instantiate a new Base Node.
108+
// Instantiate a new Base Node.
109+
#[allow(clippy::too_many_arguments)]
108110
pub fn new(
109111
db: AsyncBlockchainDb<B>,
110112
local_node_interface: LocalNodeCommsInterface,
111113
connectivity: ConnectivityRequester,
112-
peer_manager: Arc<PeerManager>,
113114
metadata_event_stream: broadcast::Receiver<Arc<ChainMetadataEvent>>,
115+
dht_event_stream: DhtEventReceiver, // New parameter
114116
config: BaseNodeStateMachineConfig,
115117
sync_validators: SyncValidators<B>,
116118
status_event_sender: watch::Sender<StatusInfo>,
@@ -123,15 +125,16 @@ impl<B: BlockchainBackend + 'static> BaseNodeStateMachine<B> {
123125
db,
124126
local_node_interface,
125127
connectivity,
126-
peer_manager,
127128
metadata_event_stream,
129+
dht_event_stream, // Initialize new field
128130
config,
129131
info: StateInfo::StartUp,
130132
event_publisher,
131133
status_event_sender: Arc::new(status_event_sender),
132134
sync_validators,
133135
randomx_factory,
134136
is_bootstrapped: false,
137+
is_primary_bootstrap_complete: false,
135138
consensus_rules,
136139
interrupt_signal,
137140
}
@@ -283,6 +286,38 @@ impl<B: BlockchainBackend + 'static> BaseNodeStateMachine<B> {
283286
}
284287
}
285288

289+
pub fn set_primary_bootstrap_complete(&mut self, complete: bool) {
290+
info!(
291+
target: LOG_TARGET,
292+
"[BN SM UPDATE] Setting primary_bootstrap_complete to {}. Was: {}. Current state: {}",
293+
complete,
294+
self.is_primary_bootstrap_complete,
295+
self.info.short_desc()
296+
);
297+
298+
self.is_primary_bootstrap_complete = complete;
299+
300+
if let StateInfo::Listening(mut info) = self.info.clone() {
301+
let had_bootstrap_phase = info.bootstrap_phase.is_some();
302+
if complete {
303+
info.bootstrap_phase = None;
304+
}
305+
self.set_state_info(StateInfo::Listening(info));
306+
307+
info!(
308+
target: LOG_TARGET,
309+
"[BN SM UPDATE] Updated Listening state. Removed bootstrap_phase: {}. Console should now show 'Listening'",
310+
had_bootstrap_phase
311+
);
312+
} else {
313+
warn!(
314+
target: LOG_TARGET,
315+
"[BN SM UPDATE] Not in Listening state ({}), bootstrap_complete flag updated but UI unchanged",
316+
self.info.short_desc()
317+
);
318+
}
319+
}
320+
286321
/// Return a copy of the `interrupt_signal` for this node. This is a `ShutdownSignal` future that will be ready when
287322
/// the node will enter a `Shutdown` state.
288323
pub fn get_interrupt_signal(&self) -> ShutdownSignal {

base_layer/core/src/base_node/state_machine_service/states/events_and_states.rs

Lines changed: 55 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@ use crate::base_node::{
3232
HeaderSyncState,
3333
HorizonStateSync,
3434
Listening,
35-
ListeningInfo,
3635
Shutdown,
3736
Starting,
3837
Waiting,
@@ -179,6 +178,54 @@ impl Display for BaseNodeState {
179178
}
180179
}
181180

181+
// Add BootstrapPhaseInfo struct
182+
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
183+
pub struct BootstrapPhaseInfo {
184+
pub current_round: usize,
185+
pub total_rounds: usize,
186+
}
187+
#[derive(Clone, Copy, Debug, PartialEq, Eq, Default)]
188+
/// This struct contains info that is useful for external viewing of state info
189+
pub struct ListeningInfo {
190+
pub synced: bool,
191+
pub initial_delay_connected_count: u64,
192+
pub initial_sync_peer_wait_count: u64,
193+
pub bootstrap_phase: Option<BootstrapPhaseInfo>,
194+
}
195+
impl Display for ListeningInfo {
196+
fn fmt(&self, fmt: &mut Formatter<'_>) -> Result<(), std::fmt::Error> {
197+
fmt.write_str("Node in listening state\n")
198+
}
199+
}
200+
201+
impl ListeningInfo {
202+
/// Creates a new ListeningInfo
203+
pub const fn new(is_synced: bool, initial_delay_connected_count: u64, initial_sync_peer_wait_count: u64) -> Self {
204+
Self {
205+
synced: is_synced,
206+
initial_delay_connected_count,
207+
initial_sync_peer_wait_count,
208+
bootstrap_phase: None,
209+
}
210+
}
211+
212+
pub fn is_synced(&self) -> bool {
213+
self.synced
214+
}
215+
216+
pub fn is_bootstrapping(&self) -> bool {
217+
self.bootstrap_phase.is_some()
218+
}
219+
220+
pub fn initial_delay_connected_count(&self) -> u64 {
221+
self.initial_delay_connected_count
222+
}
223+
224+
pub fn initial_sync_peer_wait_count(&self) -> u64 {
225+
self.initial_sync_peer_wait_count
226+
}
227+
}
228+
182229
/// This enum will display all info inside of the state engine
183230
#[derive(Debug, Clone, PartialEq)]
184231
pub enum StateInfo {
@@ -211,7 +258,13 @@ impl StateInfo {
211258

212259
BlockSync(info) => format!("Syncing blocks: {}", info.sync_progress_string_blocks()),
213260
Listening(info) => {
214-
if info.is_synced() {
261+
// NEW: Prioritize bootstrap display if active
262+
if let Some(bootstrap_info) = info.bootstrap_phase {
263+
format!(
264+
"Bootstrapping via seeds (Round {}/{})",
265+
bootstrap_info.current_round, bootstrap_info.total_rounds
266+
)
267+
} else if info.is_synced() {
215268
"Listening".to_string()
216269
} else {
217270
format!(

0 commit comments

Comments
 (0)