Skip to content

feat(ant-node): persist record index to eliminate startup CPU spike#3484

Open
tomdif wants to merge 1 commit intomaidsafe:mainfrom
tomdif:fix/persist-record-index
Open

feat(ant-node): persist record index to eliminate startup CPU spike#3484
tomdif wants to merge 1 commit intomaidsafe:mainfrom
tomdif:fix/persist-record-index

Conversation

@tomdif
Copy link
Copy Markdown

@tomdif tomdif commented Mar 1, 2026

Summary

  • Cache the record metadata index to disk so node restarts read one small msgpack file instead of doing a parallel AES-256-GCM-SIV decrypt of every record (up to 16K files)
  • Eliminates the 40+ second 100% CPU spike on startup after binary upgrades (same peer ID, same network ID)
  • Caps rayon threads in the full-scan fallback to available_parallelism - 1, reserving one core for the OS/tokio runtime

How it works

  1. On startup, try_load_cached_index() reads a small record_index msgpack file from historic_quote_dir
  2. Validates the encryption seed matches (detects peer/encryption changes) and spot-checks 10 files on disk
  3. If valid, skips the expensive update_records_from_an_existing_store() full-decrypt scan entirely
  4. If invalid or missing, falls back to the existing full scan (now thread-capped)
  5. Index is flushed on payment, on initial construction, and synchronously on Drop

Security

  • Records stay AES-256-GCM-SIV encrypted on disk — unchanged
  • Index contains only metadata already visible as plaintext filenames + enum tags
  • Encryption seed stored in index — mismatches trigger automatic full rescan
  • Tampered index worst case: node tries to serve a record that doesn't exist → fails → network re-replicates

Test plan

  • cargo check -p ant-node — passes
  • cargo clippy -p ant-node --all-targets -- -Dwarnings — passes clean
  • cargo test --release --package ant-node --lib — 76/76 tests pass (including can_store_after_restart)

…tup CPU spike

Cache the record metadata index (key, ValidationType, DataTypes) to disk
so that node restarts read one small msgpack file instead of doing a
parallel AES-256-GCM-SIV decrypt of every record on disk (up to 16K files).
This eliminates the 40+ second 100% CPU spike on startup after binary upgrades.

- Add PersistedRecordIndex struct serialized via rmp_serde
- On startup, try cached index first; fall back to full decrypt scan
- Validate index with encryption_seed match + spot-check files on disk
- Flush index on payment, on initial construction, and on Drop
- Cap rayon threads in full-scan fallback to (available_parallelism - 1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant