Skip to content

fix: migration can now correctly resume after stopping#7210

Merged
SWvheerden merged 1 commit intotari-project:developmentfrom
SWvheerden:sw_fix_potential_breakingissue_in_lmdbmigration
Jun 12, 2025
Merged

fix: migration can now correctly resume after stopping#7210
SWvheerden merged 1 commit intotari-project:developmentfrom
SWvheerden:sw_fix_potential_breakingissue_in_lmdbmigration

Conversation

@SWvheerden
Copy link
Copy Markdown
Collaborator

@SWvheerden SWvheerden commented Jun 12, 2025

Description

Fixes a potential migration issue where the migration can break

Summary by CodeRabbit

  • Bug Fixes
    • Improved migration process to ensure outdated PayRef index entries are removed before rebuilding, preventing stale or incorrect data.
    • Updated PayRef entry handling to allow existing entries to be safely overwritten during migrations, reducing potential errors.

@SWvheerden SWvheerden requested a review from a team as a code owner June 12, 2025 13:05
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jun 12, 2025

Walkthrough

The migration process in the LMDB database logic was updated to explicitly clear the PayRef index before rebuilding it during migration to version 2. Additionally, PayRef entries are now replaced rather than inserted, and the migration version is incremented stepwise rather than set to a constant.

Changes

File(s) Change Summary
base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs Enhanced migration logic: clears PayRef index before rebuilding, uses replace for PayRef entries, and increments migration version stepwise.

Sequence Diagram(s)

sequenceDiagram
    participant Migration
    participant LMDB
    participant Logger

    Migration->>LMDB: Begin write transaction
    Migration->>LMDB: Clear payref_to_output_index
    Migration->>LMDB: Commit transaction
    Migration->>Logger: Log clearing of PayRef index
    loop For each PayRef entry
        Migration->>LMDB: Replace PayRef entry in index
    end
    Migration->>LMDB: Increment migration version by one
Loading

Possibly related PRs

  • fix: add migration code #7153: Both PRs modify the run_migrations function in lmdb_db.rs to enhance migration logic, with the main PR focusing on clearing and rebuilding the PayRef index and the retrieved PR adding a rewind step based on accumulated difficulty checks.

Suggested reviewers

  • hansieodendaal

Poem

In the warren where data flows,
Rabbits clear what once arose.
Old indexes swept away,
New ones built without delay.
Replace, not insert, is the game,
Migration hops on—never the same!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs (1)

3145-3149: lmdb_clear occurs in its own write-txn – is the extra commit really needed?

Opening a dedicated write transaction just to clear the payref_to_output_index and immediately committing incurs two extra mdb_txn_begin/commit cycles while still under the outer migration loop.
Unless you expect other migrations to read/write concurrently (they don’t), you can clear the DB in the same write transaction that will shortly rebuild the index, saving an fsync and cutting migration time on large DBs.

-{
-    let txn = db.write_transaction()?;
-    lmdb_clear(&txn, &db.payref_to_output_index)?;
-    txn.commit()?;
-    info!(target: LOG_TARGET, "Cleared PayRef index");
-}
+let txn = db.write_transaction()?;
+lmdb_clear(&txn, &db.payref_to_output_index)?;
+info!(target: LOG_TARGET, "Cleared PayRef index");
+/* … proceed with rebuild in the *same* txn or commit once after rebuild */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c1923f and 26c3c0a.

📒 Files selected for processing (1)
  • base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs (1)
base_layer/core/src/chain_storage/lmdb_db/lmdb.rs (2)
  • lmdb_clear (505-514)
  • lmdb_replace (132-166)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: test (mainnet, stagenet)
  • GitHub Check: Cucumber tests / Base Layer
  • GitHub Check: test (testnet, esmeralda)
  • GitHub Check: test (nextnet, nextnet)
  • GitHub Check: Cucumber tests / FFI
  • GitHub Check: ci
  • GitHub Check: cargo check with stable
🔇 Additional comments (1)
base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs (1)

3294-3299: 👍 Switched to lmdb_replace – safer for reruns

Using lmdb_replace means repeated executions (or partial roll-backs)
won’t fail with KeyExists, allowing the migration to resume
idempotently. Looks good.

Comment on lines 3151 to 3154
for height in 0..=chain_height {
process_payref_for_height(db, height)?;
}
info!(target: LOG_TARGET, "PayRef index rebuild completed");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Potentially heavy height-by-height loop without batching

Iterating for height in 0..=chain_height invokes process_payref_for_height which opens and commits a write transaction per height.
On mainnet this means >1 M TXNs and fsyncs – several hours on spinning disks.

Consider:

  1. Rebuilding in larger batches (e.g. 1000 heights per txn) or
  2. Letting process_payref_for_height use the same write txn passed from the caller.

This will drastically reduce migration wall-clock time.

🤖 Prompt for AI Agents
In base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs around lines 3151 to
3154, the current loop calls process_payref_for_height for each height
individually, causing a separate write transaction and commit per height, which
is very inefficient for large chain heights. To fix this, refactor the code to
either batch multiple heights (e.g., 1000) into a single transaction or modify
process_payref_for_height to accept and reuse a write transaction passed from
the caller, thereby reducing the number of transactions and fsync operations and
improving performance significantly.

Comment on lines +3158 to +3166
let txn = db.write_transaction()?;
info!(target: LOG_TARGET, "Migrated database to version {}", MIGRATION_VERSION);
lmdb_replace(
&txn,
&db.metadata_db,
&k.as_u32(),
&MetadataValue::MigrationVersion(migrate_from_version + 1),
None,
)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Off-by-one: stored migration version becomes MIGRATION_VERSION + 1

When migrate_from_version == MIGRATION_VERSION (the last loop
iteration) the code persists MigrationVersion(migrate_from_version + 1),
i.e. MIGRATION_VERSION + 1.
On the next startup last_migrated_version will be greater than
MIGRATION_VERSION, making it look as if the DB is ahead of the code
base and silently skipping future migrations.

- &MetadataValue::MigrationVersion(migrate_from_version + 1),
+ &MetadataValue::MigrationVersion(std::cmp::min(
+     migrate_from_version + 1,
+     MIGRATION_VERSION,
+ )),

Or simply set MigrationVersion(MIGRATION_VERSION) after the loop and
keep the step-by-step updates for intermediate commits.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let txn = db.write_transaction()?;
info!(target: LOG_TARGET, "Migrated database to version {}", MIGRATION_VERSION);
lmdb_replace(
&txn,
&db.metadata_db,
&k.as_u32(),
&MetadataValue::MigrationVersion(migrate_from_version + 1),
None,
)?;
let txn = db.write_transaction()?;
info!(target: LOG_TARGET, "Migrated database to version {}", MIGRATION_VERSION);
lmdb_replace(
&txn,
&db.metadata_db,
&k.as_u32(),
&MetadataValue::MigrationVersion(std::cmp::min(
migrate_from_version + 1,
MIGRATION_VERSION,
)),
None,
)?;
🤖 Prompt for AI Agents
In base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs around lines 3158 to
3166, the migration version is being stored as migrate_from_version + 1, which
causes the stored version to be MIGRATION_VERSION + 1 on the last iteration,
making the DB appear ahead of the codebase. To fix this, after completing the
migration loop, explicitly set the stored MigrationVersion to MIGRATION_VERSION
instead of migrate_from_version + 1, ensuring the final version matches the
actual migration version and prevents skipping future migrations.

@github-actions
Copy link
Copy Markdown

Test Results (CI)

    3 files    126 suites   48m 6s ⏱️
1 333 tests 1 332 ✅ 0 💤 1 ❌
3 997 runs  3 994 ✅ 0 💤 3 ❌

For more details on these failures, see this check.

Results for commit 26c3c0a.

@SWvheerden SWvheerden merged commit d268f2b into tari-project:development Jun 12, 2025
12 of 16 checks passed
@SWvheerden SWvheerden deleted the sw_fix_potential_breakingissue_in_lmdbmigration branch June 12, 2025 14:13
sdbondi added a commit to sdbondi/tari that referenced this pull request Jun 18, 2025
* development: (607 commits)
  Wallet GRPC port comment fix from 18142 to 18143 (tari-project#7221)
  feat: integrated address support for Ledger (tari-project#7198)
  chore: new release v4.1.1-pre.0 (tari-project#7211)
  fix: migration can now correctly resume after stopping (tari-project#7210)
  fix: only revalidated rejected transactions on startup (tari-project#7209)
  fix: add filtering flag back (tari-project#7208)
  feat: improve wallet balance checks from external clients (tari-project#7207)
  feat!: update grpc supply query (tari-project#7137)
  docs: Updated API GRPC and Exchange Guide (tari-project#7205)
  chore: new release v4.4.0-pre.0 (tari-project#7202)
  feat: update base node proto to search bytes (tari-project#7201)
  feat: full PayRef implementation (tari-project#7154)
  test: add ffi cucumber wallet balance test (tari-project#7189)
  chore: fix tests (tari-project#7196)
  fix(network-discovery): add back idle event handling (tari-project#7194)
  Update SECURITY.md (tari-project#7193)
  fix: transaction manager service unmined lookup (tari-project#7192)
  fix: wallet ffi database name mismatch for mobile wallet (tari-project#7191)
  fix: payment_id deserialize (tari-project#7187)
  fix: remove code for deleting stale peers (tari-project#7184)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants