fix: migration can now correctly resume after stopping#7210
Conversation
WalkthroughThe migration process in the LMDB database logic was updated to explicitly clear the PayRef index before rebuilding it during migration to version 2. Additionally, PayRef entries are now replaced rather than inserted, and the migration version is incremented stepwise rather than set to a constant. Changes
Sequence Diagram(s)sequenceDiagram
participant Migration
participant LMDB
participant Logger
Migration->>LMDB: Begin write transaction
Migration->>LMDB: Clear payref_to_output_index
Migration->>LMDB: Commit transaction
Migration->>Logger: Log clearing of PayRef index
loop For each PayRef entry
Migration->>LMDB: Replace PayRef entry in index
end
Migration->>LMDB: Increment migration version by one
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs (1)
3145-3149:lmdb_clearoccurs in its own write-txn – is the extra commit really needed?Opening a dedicated write transaction just to clear the
payref_to_output_indexand immediately committing incurs two extramdb_txn_begin/commitcycles while still under the outer migration loop.
Unless you expect other migrations to read/write concurrently (they don’t), you can clear the DB in the same write transaction that will shortly rebuild the index, saving an fsync and cutting migration time on large DBs.-{ - let txn = db.write_transaction()?; - lmdb_clear(&txn, &db.payref_to_output_index)?; - txn.commit()?; - info!(target: LOG_TARGET, "Cleared PayRef index"); -} +let txn = db.write_transaction()?; +lmdb_clear(&txn, &db.payref_to_output_index)?; +info!(target: LOG_TARGET, "Cleared PayRef index"); +/* … proceed with rebuild in the *same* txn or commit once after rebuild */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs (1)
base_layer/core/src/chain_storage/lmdb_db/lmdb.rs (2)
lmdb_clear(505-514)lmdb_replace(132-166)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: test (mainnet, stagenet)
- GitHub Check: Cucumber tests / Base Layer
- GitHub Check: test (testnet, esmeralda)
- GitHub Check: test (nextnet, nextnet)
- GitHub Check: Cucumber tests / FFI
- GitHub Check: ci
- GitHub Check: cargo check with stable
🔇 Additional comments (1)
base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs (1)
3294-3299: 👍 Switched tolmdb_replace– safer for rerunsUsing
lmdb_replacemeans repeated executions (or partial roll-backs)
won’t fail withKeyExists, allowing the migration to resume
idempotently. Looks good.
| for height in 0..=chain_height { | ||
| process_payref_for_height(db, height)?; | ||
| } | ||
| info!(target: LOG_TARGET, "PayRef index rebuild completed"); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Potentially heavy height-by-height loop without batching
Iterating for height in 0..=chain_height invokes process_payref_for_height which opens and commits a write transaction per height.
On mainnet this means >1 M TXNs and fsyncs – several hours on spinning disks.
Consider:
- Rebuilding in larger batches (e.g. 1000 heights per txn) or
- Letting
process_payref_for_heightuse the same write txn passed from the caller.
This will drastically reduce migration wall-clock time.
🤖 Prompt for AI Agents
In base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs around lines 3151 to
3154, the current loop calls process_payref_for_height for each height
individually, causing a separate write transaction and commit per height, which
is very inefficient for large chain heights. To fix this, refactor the code to
either batch multiple heights (e.g., 1000) into a single transaction or modify
process_payref_for_height to accept and reuse a write transaction passed from
the caller, thereby reducing the number of transactions and fsync operations and
improving performance significantly.
| let txn = db.write_transaction()?; | ||
| info!(target: LOG_TARGET, "Migrated database to version {}", MIGRATION_VERSION); | ||
| lmdb_replace( | ||
| &txn, | ||
| &db.metadata_db, | ||
| &k.as_u32(), | ||
| &MetadataValue::MigrationVersion(migrate_from_version + 1), | ||
| None, | ||
| )?; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Off-by-one: stored migration version becomes MIGRATION_VERSION + 1
When migrate_from_version == MIGRATION_VERSION (the last loop
iteration) the code persists MigrationVersion(migrate_from_version + 1),
i.e. MIGRATION_VERSION + 1.
On the next startup last_migrated_version will be greater than
MIGRATION_VERSION, making it look as if the DB is ahead of the code
base and silently skipping future migrations.
- &MetadataValue::MigrationVersion(migrate_from_version + 1),
+ &MetadataValue::MigrationVersion(std::cmp::min(
+ migrate_from_version + 1,
+ MIGRATION_VERSION,
+ )),Or simply set MigrationVersion(MIGRATION_VERSION) after the loop and
keep the step-by-step updates for intermediate commits.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let txn = db.write_transaction()?; | |
| info!(target: LOG_TARGET, "Migrated database to version {}", MIGRATION_VERSION); | |
| lmdb_replace( | |
| &txn, | |
| &db.metadata_db, | |
| &k.as_u32(), | |
| &MetadataValue::MigrationVersion(migrate_from_version + 1), | |
| None, | |
| )?; | |
| let txn = db.write_transaction()?; | |
| info!(target: LOG_TARGET, "Migrated database to version {}", MIGRATION_VERSION); | |
| lmdb_replace( | |
| &txn, | |
| &db.metadata_db, | |
| &k.as_u32(), | |
| &MetadataValue::MigrationVersion(std::cmp::min( | |
| migrate_from_version + 1, | |
| MIGRATION_VERSION, | |
| )), | |
| None, | |
| )?; |
🤖 Prompt for AI Agents
In base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs around lines 3158 to
3166, the migration version is being stored as migrate_from_version + 1, which
causes the stored version to be MIGRATION_VERSION + 1 on the last iteration,
making the DB appear ahead of the codebase. To fix this, after completing the
migration loop, explicitly set the stored MigrationVersion to MIGRATION_VERSION
instead of migrate_from_version + 1, ensuring the final version matches the
actual migration version and prevents skipping future migrations.
Test Results (CI) 3 files 126 suites 48m 6s ⏱️ For more details on these failures, see this check. Results for commit 26c3c0a. |
* development: (607 commits) Wallet GRPC port comment fix from 18142 to 18143 (tari-project#7221) feat: integrated address support for Ledger (tari-project#7198) chore: new release v4.1.1-pre.0 (tari-project#7211) fix: migration can now correctly resume after stopping (tari-project#7210) fix: only revalidated rejected transactions on startup (tari-project#7209) fix: add filtering flag back (tari-project#7208) feat: improve wallet balance checks from external clients (tari-project#7207) feat!: update grpc supply query (tari-project#7137) docs: Updated API GRPC and Exchange Guide (tari-project#7205) chore: new release v4.4.0-pre.0 (tari-project#7202) feat: update base node proto to search bytes (tari-project#7201) feat: full PayRef implementation (tari-project#7154) test: add ffi cucumber wallet balance test (tari-project#7189) chore: fix tests (tari-project#7196) fix(network-discovery): add back idle event handling (tari-project#7194) Update SECURITY.md (tari-project#7193) fix: transaction manager service unmined lookup (tari-project#7192) fix: wallet ffi database name mismatch for mobile wallet (tari-project#7191) fix: payment_id deserialize (tari-project#7187) fix: remove code for deleting stale peers (tari-project#7184) ...
Description
Fixes a potential migration issue where the migration can break
Summary by CodeRabbit