-
Notifications
You must be signed in to change notification settings - Fork 2k
Replicating encrypted child dataset + change-key + incremental receive overwrites master key of replica, causes permission denied on remount #12614
Description
| Type | Version/Name |
|---|---|
| Distribution Name | several listed below but mainly Debian (Proxmox VE 7) |
| Distribution Version | 11 |
| Kernel Version | 5.11.22-4-pve |
| Architecture | x86_64 |
| OpenZFS Version | zfs-2.0.5-pve1 |
This was posted in a comment to #12000. I was asked to open up a new bug report.
Just started using ZoL with native encryption and think I have hit the same or a similar bug (related to #6624 as well).
truncate -s 100M /root/src.img
truncate -s 100M /root/replica.img
zpool create src /root/src.img
zpool create replica /root/replica.img
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt src/encrypted
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt replica/encrypted
zfs create src/encrypted/a
dd if=/dev/urandom of=/src/encrypted/a/test1.bin bs=1M count=1
zfs snap src/encrypted/a@test1
zfs send -Rvw src/encrypted/a@test1 | zfs receive -svF replica/encrypted/a
zfs mount -l replica/encrypted
zfs mount -l replica/encrypted/a
zfs change-key -i replica/encrypted/a
zfs umount -u replica/encrypted
zfs mount -l replica/encrypted
zfs mount replica/encrypted/a
All good at this point. Everything works as expected. Now, do an incremental send:
dd if=/dev/urandom of=/src/encrypted/a/test2.bin bs=1M count=1
zfs snap src/encrypted/a@test2
zfs send -RvwI @test1 src/encrypted/a@test2 | zfs receive -svF replica/encrypted/a
# ls -al /replica/encrypted/a/
total 2056
drwxr-xr-x 2 root root 4 Sep 26 03:59 .
drwxr-xr-x 3 root root 3 Sep 26 03:57 ..
-rw-r--r-- 1 root root 1048576 Sep 26 03:55 test1.bin
-rw-r--r-- 1 root root 1048576 Sep 26 03:59 test2.bin
Again, all good. Now unmount/mount:
zfs umount -u replica/encrypted
zfs mount -l replica/encrypted
# zfs get encryptionroot,keystatus -rt filesystem replica/encrypted
NAME PROPERTY VALUE SOURCE
replica/encrypted encryptionroot replica/encrypted -
replica/encrypted keystatus available -
replica/encrypted/a encryptionroot replica/encrypted -
replica/encrypted/a keystatus available -
# zfs mount -l replica/encrypted/a
cannot mount 'replica/encrypted/a': Permission denied
Yikes! This appears to have corrupted 10TB of backup filesystems. I've been trying to recover from this but no luck so far.
If I don't run change-key then I can send incrementals, unmount, and mount no problem (I just have to enter the password in twice). If I run change-key then unmount/mount still no problem. It's when I run change-key and then send an incremental snapshot that seems to render the filesystem unmountable.
After running change-key and sending an incremental, once the filesystem is unmounted it can't be mounted again. It looks like the encryption root absolutely has to be replicated to prevent this from happening. If I replicate the encryption root then everything works as expected.
I may have also uncovered another bug in trying to recover from this. If I run zfs change-key -o keylocation=prompt -o keyformat=passphrase replica/encrypted/a, after entering the new passwords the command hangs forever due to a panic. I have to completely reset the system.
[ 7080.228309] VERIFY3(0 == spa_keystore_dsl_key_hold_dd(dp->dp_spa, dd, FTAG, &dck)) failed (0 == 13)
[ 7080.228369] PANIC at dsl_crypt.c:1450:spa_keystore_change_key_sync_impl()
[ 7080.228399] Showing stack for process 1120
[ 7080.228403] CPU: 2 PID: 1120 Comm: txg_sync Tainted: P O 5.11.0-36-generic #40-Ubuntu
[ 7080.228406] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
[ 7080.228408] Call Trace:
[ 7080.228414] show_stack+0x52/0x58
[ 7080.228424] dump_stack+0x70/0x8b
[ 7080.228431] spl_dumpstack+0x29/0x2b [spl]
[ 7080.228448] spl_panic+0xd4/0xfc [spl]
[ 7080.228459] ? dsl_wrapping_key_rele.constprop.0+0x12/0x20 [zfs]
[ 7080.228597] ? spa_keystore_dsl_key_hold_dd+0x1a8/0x200 [zfs]
[ 7080.228687] spa_keystore_change_key_sync_impl+0x3c0/0x3d0 [zfs]
[ 7080.228776] ? zap_lookup+0x16/0x20 [zfs]
[ 7080.228899] spa_keystore_change_key_sync+0x157/0x3c0 [zfs]
[ 7080.228988] ? dmu_buf_rele+0xe/0x10 [zfs]
[ 7080.229064] ? dsl_dir_rele+0x30/0x40 [zfs]
[ 7080.229189] ? spa_keystore_change_key_check+0x178/0x4f0 [zfs]
[ 7080.229324] dsl_sync_task_sync+0xb5/0x100 [zfs]
[ 7080.229418] dsl_pool_sync+0x365/0x3f0 [zfs]
[ 7080.229507] spa_sync_iterate_to_convergence+0xe0/0x1e0 [zfs]
[ 7080.229609] spa_sync+0x305/0x5b0 [zfs]
[ 7080.229718] txg_sync_thread+0x26c/0x2f0 [zfs]
[ 7080.229835] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[ 7080.229952] thread_generic_wrapper+0x79/0x90 [spl]
[ 7080.229963] kthread+0x11f/0x140
[ 7080.229970] ? __thread_exit+0x20/0x20 [spl]
[ 7080.229980] ? set_kthread_struct+0x50/0x50
[ 7080.229984] ret_from_fork+0x22/0x30
I've tested this on (all x64_64):
- Proxmox VE 7 (Debian Bullseye with zfs-2.0.5-pve1, 5.11.22-4-pve)
- Stock Debian Bullseye (zfs-2.0.3-9, 5.10.0-8-amd64)
- Stock Ubuntu 20.04 LTS (zfs-2.0.2-1ubuntu5.1, 5.11.0-36-generic)
- FreeBSD 13.0-RELEASE-p4 (zfs-2.0.0-FreeBSD_gf11b09dec)