Skip to content

disk-deactivate: wipe raid arrays before stopping them#1248

Open
kmein wants to merge 2 commits intonix-community:masterfrom
kmein:fix/mdadm-destroy
Open

disk-deactivate: wipe raid arrays before stopping them#1248
kmein wants to merge 2 commits intonix-community:masterfrom
kmein:fix/mdadm-destroy

Conversation

@kmein
Copy link
Copy Markdown

@kmein kmein commented Apr 8, 2026

Closes #1247

How to test

  1. Go to $TMP: cd "$(mktemp -d)"
  2. Get the gist: wget https://gist.githubusercontent.com/kmein/9644e65ab8218a4b57c35cb9290b86ad/raw/3ec76f6a142abb1f4ea3764563a8843984b300d4/flake.nix
  3. See it fail: nix build (output: Exception: The canary file survived the Disko wipe process!)
  4. Switch the branch: sed -i 's#github:nix-community/disko#github:kmein/disko?ref=fix/mdadm-destroy#' flake.nix
  5. Watch it work: nix build

Previously, the `disk-deactivate` script stopped mdadm arrays before
wiping the underlying physical disks. This caused filesystems with
backup superblocks (like BTRFS at 64 MiB and 256 GiB offsets) to
survive the wipe, as the backups were striped across the physical
disks and missed by `wipefs` on the raw block devices.

When the array was reassembled during reprovisioning, the filesystem
superblocks realigned. `disko` would detect the old filesystem via
`blkid` and silently skip the `mkfs` step, leaving stale data intact.

This commit adds a `wipefs --all` command against the assembled RAID
device *before* stopping it, ensuring all filesystem signatures and
backup superblocks are cleanly destroyed.
Copilot AI review requested due to automatic review settings April 8, 2026 07:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses #1247 by ensuring disk-deactivate wipes filesystem/partition signatures on active mdadm RAID devices before stopping the array, preventing stale on-array signatures (e.g., btrfs backup superblocks) from surviving teardown and causing later reprovisioning to skip mkfs.

Changes:

  • For lsblk devices with types containing "raid", run wipefs --all on the RAID device before mdadm --stop.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

elif (.type | contains("raid")) then
[
"wipefs --all -f \(.path | shellquote)",
"mdadm --stop \(.name | shellquote)"
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mdadm --stop is invoked with .name, but lsblk’s name field is typically not an absolute device path. This can make the stop command fail (and it also differs from other places in the repo that stop arrays via /dev/md/<name>). Prefer stopping the array via .path (or otherwise ensure an absolute /dev/... path is passed).

Suggested change
"mdadm --stop \(.name | shellquote)"
"mdadm --stop \(.path | shellquote)"

Copilot uses AI. Check for mistakes.
Comment on lines 61 to 64
elif (.type | contains("raid")) then
[
"wipefs --all -f \(.path | shellquote)",
"mdadm --stop \(.name | shellquote)"
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes a subtle teardown ordering bug (wipe md device signatures before stopping the array), but there’s no automated regression test ensuring destroyFormatMount actually forces a re-mkfs for filesystems on mdadm arrays (especially btrfs with backup superblocks). Consider adding a NixOS test that provisions btrfs-on-mdadm, runs destroy+recreate, and asserts the old btrfs signature/data is gone (i.e., mkfs is not skipped).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a NixOS test that ensures the expected behaviour. It does not use diskoLib.testLib.makeDiskoTest because that function does not test the unhappy-path possibility of files being left over from previous iterations.

Adds a NixOS integration test to verify that the `disk-deactivate`
script properly destroys BTRFS filesystems residing on mdadm RAID arrays.
@kmein kmein force-pushed the fix/mdadm-destroy branch from 3184aa7 to dffa200 Compare April 8, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

disk-deactivate fails to wipe BTRFS on mdadm RAID (leaves stale data, skips mkfs)

2 participants