Skip to content

fix: add timeout for DNS failures#6972

Merged
stringhandler merged 7 commits intotari-project:developmentfrom
stringhandler:st-create-timeout-for-dns
Apr 22, 2025
Merged

fix: add timeout for DNS failures#6972
stringhandler merged 7 commits intotari-project:developmentfrom
stringhandler:st-create-timeout-for-dns

Conversation

@stringhandler
Copy link
Copy Markdown
Contributor

@stringhandler stringhandler commented Apr 21, 2025

Summary by CodeRabbit

  • New Features

    • Expanded default DNS seed hosts to include IPv4, IPv6, and Tor-specific entries.
    • Enabled DNSSEC by default for DNS seed resolution.
    • Added native certificate support to DNS resolver features.
  • Improvements

    • Added timeouts and enhanced logging for DNS seed resolution and DNS name server connections to improve reliability and transparency.
    • Introduced a one-second timeout for DNS resolver operations.
    • Updated default peer seed configurations with expanded lists of peer seed addresses including IPv4, IPv6, and onion3 addresses.
    • Converted several DNS client connection methods from asynchronous to synchronous to streamline operations.
    • Refined DNS seed peer parsing to require exactly one address and improved error reporting.
    • Updated DNS seed domains to reflect current network naming.
  • Bug Fixes

    • Improved error handling and logging when parsing DNS seed peer strings, ensuring only actual errors trigger warnings.
    • Simplified and clarified parsing logic for DNS seed peer addresses.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 21, 2025

Walkthrough

This update introduces several improvements to the peer-to-peer (P2P) networking layer. The DNS resolver configuration is enhanced by adding a one-second timeout and enabling the "native-certs" feature. The default DNS seed hosts are expanded to include IPv4, IPv6, and Tor variants. DNS resolution and name server connection attempts now have explicit five-second timeouts with improved logging for successes, errors, and timeouts. Additionally, the parsing logic for seed peer strings is simplified to expect a single address per entry, and error logging is made more precise. Several asynchronous DNS client connection methods were converted to synchronous calls, simplifying control flow in multiple components.

Changes

File(s) Change Summary
base_layer/p2p/Cargo.toml Expanded hickory-resolver features to include "native-certs" and reformatted the feature list to multi-line array.
base_layer/p2p/src/config.rs Expanded default DNS seed hosts to four entries including IPv4, IPv6, and Tor variants in PeerSeedsConfig.
base_layer/p2p/src/dns/client.rs Converted connect and connect_secure methods from async to sync; added a 1s timeout to DNS resolver options; added warning log on TXT record parse errors.
base_layer/p2p/src/initialization.rs Added 5s timeouts and improved logging for DNS seed resolution and DNS name server connection attempts; changed DNS name server connections to return futures without awaiting.
base_layer/p2p/src/peer_seeds.rs Converted connect methods from async to sync; simplified SeedPeer parsing to a single address; improved error logging; updated test DNS seed domain.
base_layer/core/src/base_node/tari_pulse_service/mod.rs Converted get_dns_client from async to sync, removing .await usage.
base_layer/p2p/src/auto_update/dns.rs Converted DnsSoftwareUpdate::connect from async to sync, removing .await usage inside.
base_layer/p2p/src/auto_update/mod.rs Removed .await on DnsSoftwareUpdate::connect call in check_for_updates function to reflect sync change.
common/config/presets/b_peer_seeds.toml Updated peer seed configurations to use explicit lists of peer seed strings instead of commented single entries.

Sequence Diagram(s)

sequenceDiagram
    participant App
    participant P2pInitializer
    participant DNSResolver
    participant Logger

    App->>P2pInitializer: Start initialization
    P2pInitializer->>DNSResolver: Resolve DNS seed (with 5s timeout)
    DNSResolver-->>P2pInitializer: Return result or timeout
    P2pInitializer->>Logger: Log elapsed time and result
    P2pInitializer->>DNSResolver: Connect to DNS name server (returns future, no await)
    DNSResolver-->>P2pInitializer: Return connection result or error
    P2pInitializer->>Logger: Log connection status
    P2pInitializer->>App: Continue initialization
Loading

Possibly related PRs

  • fix(p2p/dns): greatly simplify DNS TXT lookup #6922: Updates the hickory-resolver dependency and simplifies DNS client connection methods by removing async signatures and the trust_anchor parameter, closely related to this PR’s DNS client and dependency changes.

Suggested reviewers

  • SWvheerden

Poem

In the warren where the packets leap,
We’ve trimmed the seeds and made them neat.
With timeouts set and logging bright,
DNS now hops with rabbit’s might.
One address per peer, no more a mess—
Our network’s ready, swift, and blessed!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d932ce9 and 0ec06ab.

📒 Files selected for processing (2)
  • base_layer/p2p/src/config.rs (1 hunks)
  • base_layer/p2p/src/peer_seeds.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • base_layer/p2p/src/config.rs
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: test (nextnet, nextnet)
  • GitHub Check: test (mainnet, stagenet)
  • GitHub Check: ci
  • GitHub Check: cargo check with stable
🔇 Additional comments (4)
base_layer/p2p/src/peer_seeds.rs (4)

57-59: Synchronous method conversion looks good

The change from asynchronous to synchronous method is consistent with the goal of simplifying control flow in the DNS resolution components.


66-68: Synchronous method conversion looks good

The conversion from async to synchronous method properly matches the corresponding changes in the DnsClient implementation.


83-89: Good improvement using inspect_err

The change to use .inspect_err() is a cleaner approach for logging errors without modifying the error flow, making the code more idiomatic.


226-228: Updated test domain and synchronous method usage

The test has been correctly updated to:

  1. Use the synchronous connect method without .await
  2. Use the new DNS seed domain seeds.nextnet.tari.com instead of seeds.esmeralda.tari.com

These changes align with the broader refactoring.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

let res = match (dns_seeds_use_dnssec, dns == &DnsNameServer::System) {
(true, false) => DnsSeedResolver::connect_secure(dns.clone()).await,
(_, _) => DnsSeedResolver::connect(dns.clone()).await,
(true, false) => timeout(Duration::from_secs(5), DnsSeedResolver::connect_secure(dns.clone())).await,
Copy link
Copy Markdown
Member

@sdbondi sdbondi Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connect and connect_secure are mistakenly marked as async functions and don't do IO, so they could be simplified

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
base_layer/p2p/src/dns/client.rs (1)

79-84: Make the 1‑second resolver timeout configurable

Hard–coding a 1 second timeout (opts.timeout = Duration::from_secs(1)) improves failure‑detection on healthy networks, but it risks spurious timeouts on slower or high‑latency links (e.g. satellite or congested mobile networks).

Consider exposing the value via configuration (or at least a const) so that operators can tune it without recompiling.

base_layer/p2p/src/config.rs (1)

77-91: Avoid repeated Network::get_current_or_user_setting_or_default() calls

You call Network::get_current_or_user_setting_or_default() three times in a row.
Compute it once to reduce overhead and improve readability:

-            dns_seeds: vec![
-                format!(
-                    "seeds.{}.tari.com",
-                    Network::get_current_or_user_setting_or_default().as_key_str()
-                ),
-                format!(
-                    "ip4.seeds.{}.tari.com",
-                    Network::get_current_or_user_setting_or_default().as_key_str()
-                ),
-                format!(
-                    "ip6.seeds.{}.tari.com",
-                    Network::get_current_or_user_setting_or_default().as_key_str()
-                ),
-            ]
+            dns_seeds: {
+                let net = Network::get_current_or_user_setting_or_default().as_key_str();
+                vec![
+                    format!("seeds.{net}.tari.com"),
+                    format!("ip4.seeds.{net}.tari.com"),
+                    format!("ip6.seeds.{net}.tari.com"),
+                ]
+            }
base_layer/p2p/src/initialization.rs (2)

584-599: Consider removing commented-out code

The old DNS connection handling code is commented out but should be removed if the new implementation is stable and tested.

-            // match res {
-            //     Ok(val) => {
-            //         trace!(target: LOG_TARGET, "Found DNS client at '{}'", dns);
-            //         return Ok(val);
-            //     },
-            //     Err(err) => {
-            //         warn!(
-            //             target: LOG_TARGET,
-            //             "DNS entry '{}' did not respond, trying the next one. You can edit 'dns_seed_name_servers' in
-            // \             the config file. (Error: {})",
-            //             dns,
-            //             err.to_string(),
-            //         );
-            //         dns_errors.push(err.to_string())
-            //     },
-            // }

514-527: Consider making timeout durations configurable

The 5-second timeout values are hardcoded in both DNS seed resolution and name server connection logic. Consider making these configurable in the P2pConfig to allow users to adjust based on their network conditions.

// Add to P2pConfig struct (in config.rs)
+    /// Timeout in seconds for DNS seed resolution
+    pub dns_seed_resolution_timeout_secs: u64,
+    /// Timeout in seconds for DNS name server connection
+    pub dns_name_server_connection_timeout_secs: u64,

// Then use it in the code
-                let seeds_res = match timeout(Duration::from_secs(5), resolver.resolve(addr)).await {
+                let seeds_res = match timeout(Duration::from_secs(config.dns_seed_resolution_timeout_secs), resolver.resolve(addr)).await {

// And similarly for the name server connection
-                (true, false) => timeout(Duration::from_secs(5), DnsSeedResolver::connect_secure(dns.clone())).await,
-                (_, _) => timeout(Duration::from_secs(5), DnsSeedResolver::connect(dns.clone())).await,
+                (true, false) => timeout(Duration::from_secs(config.dns_name_server_connection_timeout_secs), DnsSeedResolver::connect_secure(dns.clone())).await,
+                (_, _) => timeout(Duration::from_secs(config.dns_name_server_connection_timeout_secs), DnsSeedResolver::connect(dns.clone())).await,

Also applies to: 569-583

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6951379 and 0daa96d.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • base_layer/p2p/Cargo.toml (1 hunks)
  • base_layer/p2p/src/config.rs (1 hunks)
  • base_layer/p2p/src/dns/client.rs (1 hunks)
  • base_layer/p2p/src/initialization.rs (3 hunks)
  • base_layer/p2p/src/peer_seeds.rs (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
base_layer/p2p/src/config.rs (3)
common/src/configuration/network.rs (1)
  • get_current_or_user_setting_or_default (60-71)
common/src/configuration/dns_name_server_list.rs (1)
  • from_str (34-42)
common/src/configuration/name_server.rs (1)
  • from_str (68-87)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: test (mainnet, stagenet)
  • GitHub Check: ci
🔇 Additional comments (6)
base_layer/p2p/Cargo.toml (1)

38-43: 👍 Enabling native-certs is a sensible default

Including the native-certs feature lets Hickory use the OS trust store, avoiding manual CA bundles. No further action required.

base_layer/p2p/src/config.rs (1)

96-97: DNSSEC default toggled to true – verify operational impact

Turning DNSSEC on by default is a security win, but make sure:

  1. All default and documented public DNS servers in dns_seed_name_servers support DNSSEC validation.
  2. Nodes behind restrictive firewalls still resolve seeds successfully.
base_layer/p2p/src/peer_seeds.rs (1)

82-89: Nice: log only failed parses

Changing inspect()inspect_err() stops logging successful parses, keeping logs clean.

base_layer/p2p/src/initialization.rs (3)

79-82: Good addition of timeout import

The addition of tokio::time::timeout is appropriate for the implementation of timeout handling in DNS operations.


514-527: Excellent timeout implementation for DNS seed resolution

Adding a 5-second timeout for DNS seed resolution is a great improvement that prevents the application from hanging indefinitely when DNS servers are unresponsive. The additional logging provides valuable context about resolution timing.


569-583: Well-implemented DNS name server connection timeout

The timeout implementation for DNS name server connections follows the same robust pattern as the seed resolution. This change properly logs timeouts separately from other connection errors, improving observability and debugging.

sdbondi
sdbondi previously approved these changes Apr 21, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 21, 2025

Test Results (CI)

    2 files     77 suites   38m 41s ⏱️
1 332 tests 1 331 ✅ 0 💤 1 ❌
2 470 runs  2 469 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit 0ec06ab.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 21, 2025

Test Results (Integration tests)

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
0 files   ±0   0 ❌ ±0 

Results for commit f77f6f0. ± Comparison against base commit 6951379.

♻️ This comment has been updated with latest results.

SWvheerden
SWvheerden previously approved these changes Apr 22, 2025
Copy link
Copy Markdown
Collaborator

@SWvheerden SWvheerden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SWvheerden SWvheerden dismissed stale reviews from sdbondi and themself via 5db0dc9 April 22, 2025 05:41
@stringhandler stringhandler requested a review from a team as a code owner April 22, 2025 09:27
)
.expect("string is valid"),
dns_seeds_use_dnssec: false,
dns_seeds_use_dnssec: true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNSSec is not enabled on tari.com, is it worth enabling by default?

),
format!(
"ip6.seeds.{}.tari.com",
Network::get_current_or_user_setting_or_default().as_key_str()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also add tor.seeds.{}.tari.com, if we want to pull as many nodes?

@stringhandler stringhandler merged commit 6dfff45 into tari-project:development Apr 22, 2025
10 of 13 checks passed
@stringhandler stringhandler deleted the st-create-timeout-for-dns branch April 22, 2025 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants