feat: refresh the peer_db#7382
Conversation
WalkthroughThis change adds a new database migration for the peer manager module that clears all data from the Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
comms/core/src/peer_manager/storage/migrations/2025-07-30-093600_refresh_db/up.sql (2)
6-23: Consider makingpeer_idthe rowid again unless there’s a strong reason not toPreviously the schema used
INTEGER PRIMARY KEY(alias for the rowid, giving free indexing and small storage).
Changing toBIGINT PRIMARY KEYremoves those SQLite optimisations and disables the implicit autoincrement behaviour. Ifpeer_idis still generated by the application, that’s fine – but if the intention was to rely on SQLite autoincrement we’ll needINTEGER PRIMARY KEY AUTOINCREMENT.- peer_id BIGINT PRIMARY KEY NOT NULL, + peer_id INTEGER PRIMARY KEY,Leaving it as-is is perfectly valid; just make sure all insert code paths supply the value explicitly.
29-51: Add a uniqueness guard to avoid duplicate(peer_id, address)rowsThe address table can accumulate duplicates over time if the same address is discovered repeatedly. A composite unique constraint (or at least an index) on
(peer_id, address)prevents bloat and keeps queries deterministic.CREATE TABLE multi_addresses ( @@ is_external BOOLEAN NOT NULL DEFAULT TRUE, FOREIGN KEY (peer_id) REFERENCES peers (peer_id) ON DELETE CASCADE ); +-- Prevent duplicate addresses per peer +CREATE UNIQUE INDEX uq_multi_addr_peer_address + ON multi_addresses (peer_id, address);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
comms/core/src/peer_manager/storage/migrations/2025-07-30-093600_refresh_db/down.sql(1 hunks)comms/core/src/peer_manager/storage/migrations/2025-07-30-093600_refresh_db/up.sql(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: hansieodendaal
PR: tari-project/tari#7358
File: comms/core/src/peer_manager/storage/database.rs:566-570
Timestamp: 2025-07-21T16:03:14.269Z
Learning: In the Tari peer database, there was a known issue with JSON serialization corruption in the `source` field of the `multi_addresses` table, causing approximately 0.4% of peer validation failures. The migration to Borsh serialization (2025-07-21-170500_peer_address_source) intentionally uses a destructive approach (dropping and recreating tables) because the existing JSON data contains corruption that cannot be reliably converted. This data loss is acceptable to ensure data integrity going forward.
Learnt from: hansieodendaal
PR: tari-project/tari#6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.
Learnt from: hansieodendaal
PR: tari-project/tari#7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the `hard_delete_all_stale_peers` method in `comms/core/src/peer_manager/storage/database.rs`, the SQL query intentionally uses exact equality (`peers.features = ?`) rather than bitwise operations (`peers.features & ? != 0`) when matching `COMMUNICATION_NODE` features. This is the intended behavior to match only peers with exactly the `COMMUNICATION_NODE` feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: tari-project/tari#7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via `update_peer_sql(peer: Peer)` which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
comms/core/src/peer_manager/storage/migrations/2025-07-30-093600_refresh_db/down.sql (1)
Learnt from: hansieodendaal
PR: #7358
File: comms/core/src/peer_manager/storage/database.rs:566-570
Timestamp: 2025-07-21T16:03:14.269Z
Learning: In the Tari peer database, there was a known issue with JSON serialization corruption in the source field of the multi_addresses table, causing approximately 0.4% of peer validation failures. The migration to Borsh serialization (2025-07-21-170500_peer_address_source) intentionally uses a destructive approach (dropping and recreating tables) because the existing JSON data contains corruption that cannot be reliably converted. This data loss is acceptable to ensure data integrity going forward.
comms/core/src/peer_manager/storage/migrations/2025-07-30-093600_refresh_db/up.sql (4)
Learnt from: hansieodendaal
PR: #7358
File: comms/core/src/peer_manager/storage/database.rs:566-570
Timestamp: 2025-07-21T16:03:14.269Z
Learning: In the Tari peer database, there was a known issue with JSON serialization corruption in the source field of the multi_addresses table, causing approximately 0.4% of peer validation failures. The migration to Borsh serialization (2025-07-21-170500_peer_address_source) intentionally uses a destructive approach (dropping and recreating tables) because the existing JSON data contains corruption that cannot be reliably converted. This data loss is acceptable to ensure data integrity going forward.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer) which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Cucumber tests / Base Layer
- GitHub Check: Cucumber tests / FFI
- GitHub Check: test (mainnet, stagenet)
- GitHub Check: test (nextnet, nextnet)
- GitHub Check: cargo check with stable
- GitHub Check: test (testnet, esmeralda)
- GitHub Check: ledger build tests
🔇 Additional comments (2)
comms/core/src/peer_manager/storage/migrations/2025-07-30-093600_refresh_db/up.sql (1)
1-4: Destructive drop is fine, but ensure node bootstrap is still possibleThe migration wipes all peer and address data. While this solves the corruption problem, it also leaves the node with an empty peer set after the migration. Please double-check that the bootstrap/seeding logic is triggered correctly on first start-up after the migration so the node can reconnect to the network without manual intervention.
comms/core/src/peer_manager/storage/migrations/2025-07-30-093600_refresh_db/down.sql (1)
1-1: Down migration acknowledged as intentional no-opNo concerns – the irreversible approach matches previous destructive migrations used to purge corrupted data.
SWvheerden
left a comment
There was a problem hiding this comment.
Wont this cause a problem where ther are not address in the peer db?
we should dump the entire db, not just one the one table, do it from rust
The migration dumps both the I will do a test with an empty database to confirm if the migration works, - I do not see any reason why it will not work. I have previously tested:
I have now confirmed - this works:
|
Description
The database needs to be refreshed to get rid of corrupted data, which was fixed by PR#7374.
Motivation and Context
See above
How Has This Been Tested?
System-level testing running migrations
What process can a PR reviewer use to test or verify this change?
Code review
System-level testing
Breaking Changes
Summary by CodeRabbit