Skip to content

Replace DETS with shu for persistent metadata in ra_log_meta#624

Draft
kjnilsson wants to merge 23 commits intomainfrom
shu
Draft

Replace DETS with shu for persistent metadata in ra_log_meta#624
kjnilsson wants to merge 23 commits intomainfrom
shu

Conversation

@kjnilsson
Copy link
Copy Markdown
Contributor

This commit replaces the DETS-based backend in ra_log_meta with the shu durable data store, providing faster, more efficient persistence for Raft metadata (current_term, voted_for, last_applied).

Key changes:

  • New schema with three separate fields (current_term, voted_for, last_applied), with current_term and voted_for as low-frequency fields (auto-fsynced on write) and last_applied as high-frequency (WAL-buffered, fsync on sync() call)

  • Automatic migration from existing DETS files on first start: reads all data from meta.dets, writes to shu store, then renames the original file to meta.dets.migrated for backup

  • ETS hot-cache repopulation via shu:fold/3 on startup to maintain fast reads without hitting shu on every fetch

  • WAL-full handling with synchronous compaction in handle_batch; proactive compaction triggered when WAL usage exceeds 80% watermark to avoid blocking the writer in normal operation

  • Proper supervision and error handling: process crash results in full log subtree restart and transparent recovery from shu state and WAL on next init

  • All public API signatures preserved; no changes required in callers

Performance improvements:

  • Smaller file footprint compared to DETS
  • Batch write optimization via shu:write_batch/2
  • Atomic term+vote updates in future refactoring (currently preserved for compatibility via update_key rule)

Testing:

  • Existing roundtrip and delete tests pass with the new backend
  • Manual verification of crash-recovery via proc_lib:stop
  • Full test suite core functionality passes; distributed tests require shu.app deployment on remote nodes (infrastructure setup task)

This commit replaces the DETS-based backend in ra_log_meta with the shu
durable data store, providing faster, more efficient persistence for Raft
metadata (current_term, voted_for, last_applied).

Key changes:

- New schema with three separate fields (current_term, voted_for, last_applied),
  with current_term and voted_for as low-frequency fields (auto-fsynced on
  write) and last_applied as high-frequency (WAL-buffered, fsync on sync() call)

- Automatic migration from existing DETS files on first start: reads all data
  from meta.dets, writes to shu store, then renames the original file to
  meta.dets.migrated for backup

- ETS hot-cache repopulation via shu:fold/3 on startup to maintain fast reads
  without hitting shu on every fetch

- WAL-full handling with synchronous compaction in handle_batch; proactive
  compaction triggered when WAL usage exceeds 80% watermark to avoid blocking
  the writer in normal operation

- Proper supervision and error handling: process crash results in full log
  subtree restart and transparent recovery from shu state and WAL on next init

- All public API signatures preserved; no changes required in callers

Performance improvements:
- Smaller file footprint compared to DETS
- Batch write optimization via shu:write_batch/2
- Atomic term+vote updates in future refactoring (currently preserved for
  compatibility via update_key rule)

Testing:
- Existing roundtrip and delete tests pass with the new backend
- Manual verification of crash-recovery via proc_lib:stop
- Full test suite core functionality passes; distributed tests require shu.app
  deployment on remote nodes (infrastructure setup task)

Made-with: Cursor
Split voted_for storage into two separate shu fields:
- voted_for_node (atom, low-frequency): The node part of a {Node, ServerName}
  tuple - typically a small fixed set of values that repeat across many records
- voted_for_name (binary, low-frequency): The server name part - either an
  arbitrary atom (when VotedFor is a bare atom) or a binary-encoded atom

This optimization improves storage efficiency by leveraging shu's atom table
for the limited set of node names (preventing redundant atom table entries
for the same few nodes) while keeping server names as binaries to handle
the unbounded set of unique server identifiers.

Backward compatible with existing DETS data through decode_voted_for/1
which handles legacy single-atom format and tuple format.

All tests pass with the optimized schema.

Made-with: Cursor
The second clause of handle_info/2 had a dialyzer warning because the
pattern-matched MRef variable shadowed the record field access, making
it impossible for the second clause to match when MRef from the tuple
differed from the record's compact_mref field.

Fix by using a distinct variable name in the second clause to properly
distinguish between the tuple MRef and the record's MRef for the mismatch case.

Made-with: Cursor
The original 24-byte limit was based on typical ra_uid lengths, but tests
use longer UIDs like 'recovery_checkpoint_written_on_shutdown' (39 bytes).

Shu supports up to 255-byte keys, so increase the limit to accommodate all
possible UID lengths without restriction. This allows ra_2_SUITE tests to pass.

Fixes: {invalid_key_size,39,24} errors in ra_2_SUITE
Made-with: Cursor
When shu:read_all returns an empty map (for a fresh key with no data),
we should skip inserting an incomplete row into ETS. This happens when
starting on a new peer node where the shu store exists but has no records yet.

Fixes coordination_SUITE tests failing on fresh peer startup.

Made-with: Cursor
Both dets:close/1 and file:rename/2 return values that need to be
explicitly ignored to satisfy the compiler's unmatched_returns check.

Made-with: Cursor
64 bytes is sufficient for all practical ra_uid() values in production
while saving space. Test UIDs like 'recovery_checkpoint_written_on_shutdown'
(39 bytes) fit comfortably within this limit.

Made-with: Cursor
Since all schema fields are defined with frequency => low, shu automatically
fsyncs them immediately. The distinction between sync and non-sync operations
is not needed - shu's frequency config handles it transparently.

Made-with: Cursor
Tests that call ra_log_meta functions need a uid in the config.
Convert test case name to binary for use as uid, similar to ra_2_SUITE.

Made-with: Cursor
When starting remote peer nodes in tests, shu must be included in the
code path alongside other ra dependencies. Without this, distributed
tests fail with 'shu.app not found' errors.

Made-with: Cursor
After stopping and restarting the ra_log_meta process, tests need to wait
for it to be fully initialized before attempting to fetch data. The await/1
function ensures the process has finished initialization (loading data from
shu into ETS) before proceeding.

This fixes timing issues in tests where the process restart is slower than
expected (especially on CI systems).

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant