Skip to content

Replicating encrypted child dataset + change-key + incremental receive overwrites master key of replica, causes permission denied on remount #12614

@brenc

Description

@brenc
Type Version/Name
Distribution Name several listed below but mainly Debian (Proxmox VE 7)
Distribution Version 11
Kernel Version 5.11.22-4-pve
Architecture x86_64
OpenZFS Version zfs-2.0.5-pve1

This was posted in a comment to #12000. I was asked to open up a new bug report.

Just started using ZoL with native encryption and think I have hit the same or a similar bug (related to #6624 as well).

truncate -s 100M /root/src.img
truncate -s 100M /root/replica.img

zpool create src /root/src.img
zpool create replica /root/replica.img

zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt src/encrypted
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt replica/encrypted

zfs create src/encrypted/a

dd if=/dev/urandom of=/src/encrypted/a/test1.bin bs=1M count=1

zfs snap src/encrypted/a@test1

zfs send -Rvw src/encrypted/a@test1 | zfs receive -svF replica/encrypted/a

zfs mount -l replica/encrypted

zfs mount -l replica/encrypted/a

zfs change-key -i replica/encrypted/a

zfs umount -u replica/encrypted

zfs mount -l replica/encrypted

zfs mount replica/encrypted/a

All good at this point. Everything works as expected. Now, do an incremental send:

dd if=/dev/urandom of=/src/encrypted/a/test2.bin bs=1M count=1

zfs snap src/encrypted/a@test2

zfs send -RvwI @test1 src/encrypted/a@test2 | zfs receive -svF replica/encrypted/a

# ls -al /replica/encrypted/a/
total 2056
drwxr-xr-x 2 root root       4 Sep 26 03:59 .
drwxr-xr-x 3 root root       3 Sep 26 03:57 ..
-rw-r--r-- 1 root root 1048576 Sep 26 03:55 test1.bin
-rw-r--r-- 1 root root 1048576 Sep 26 03:59 test2.bin

Again, all good. Now unmount/mount:

zfs umount -u replica/encrypted

zfs mount -l replica/encrypted

# zfs get encryptionroot,keystatus -rt filesystem replica/encrypted
NAME                 PROPERTY        VALUE              SOURCE
replica/encrypted    encryptionroot  replica/encrypted  -
replica/encrypted    keystatus       available          -
replica/encrypted/a  encryptionroot  replica/encrypted  -
replica/encrypted/a  keystatus       available          -

# zfs mount -l replica/encrypted/a
cannot mount 'replica/encrypted/a': Permission denied

Yikes! This appears to have corrupted 10TB of backup filesystems. I've been trying to recover from this but no luck so far.

If I don't run change-key then I can send incrementals, unmount, and mount no problem (I just have to enter the password in twice). If I run change-key then unmount/mount still no problem. It's when I run change-key and then send an incremental snapshot that seems to render the filesystem unmountable.

After running change-key and sending an incremental, once the filesystem is unmounted it can't be mounted again. It looks like the encryption root absolutely has to be replicated to prevent this from happening. If I replicate the encryption root then everything works as expected.

I may have also uncovered another bug in trying to recover from this. If I run zfs change-key -o keylocation=prompt -o keyformat=passphrase replica/encrypted/a, after entering the new passwords the command hangs forever due to a panic. I have to completely reset the system.

[ 7080.228309] VERIFY3(0 == spa_keystore_dsl_key_hold_dd(dp->dp_spa, dd, FTAG, &dck)) failed (0 == 13)
[ 7080.228369] PANIC at dsl_crypt.c:1450:spa_keystore_change_key_sync_impl()
[ 7080.228399] Showing stack for process 1120
[ 7080.228403] CPU: 2 PID: 1120 Comm: txg_sync Tainted: P           O      5.11.0-36-generic #40-Ubuntu
[ 7080.228406] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
[ 7080.228408] Call Trace:
[ 7080.228414]  show_stack+0x52/0x58
[ 7080.228424]  dump_stack+0x70/0x8b
[ 7080.228431]  spl_dumpstack+0x29/0x2b [spl]
[ 7080.228448]  spl_panic+0xd4/0xfc [spl]
[ 7080.228459]  ? dsl_wrapping_key_rele.constprop.0+0x12/0x20 [zfs]
[ 7080.228597]  ? spa_keystore_dsl_key_hold_dd+0x1a8/0x200 [zfs]
[ 7080.228687]  spa_keystore_change_key_sync_impl+0x3c0/0x3d0 [zfs]
[ 7080.228776]  ? zap_lookup+0x16/0x20 [zfs]
[ 7080.228899]  spa_keystore_change_key_sync+0x157/0x3c0 [zfs]
[ 7080.228988]  ? dmu_buf_rele+0xe/0x10 [zfs]
[ 7080.229064]  ? dsl_dir_rele+0x30/0x40 [zfs]
[ 7080.229189]  ? spa_keystore_change_key_check+0x178/0x4f0 [zfs]
[ 7080.229324]  dsl_sync_task_sync+0xb5/0x100 [zfs]
[ 7080.229418]  dsl_pool_sync+0x365/0x3f0 [zfs]
[ 7080.229507]  spa_sync_iterate_to_convergence+0xe0/0x1e0 [zfs]
[ 7080.229609]  spa_sync+0x305/0x5b0 [zfs]
[ 7080.229718]  txg_sync_thread+0x26c/0x2f0 [zfs]
[ 7080.229835]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[ 7080.229952]  thread_generic_wrapper+0x79/0x90 [spl]
[ 7080.229963]  kthread+0x11f/0x140
[ 7080.229970]  ? __thread_exit+0x20/0x20 [spl]
[ 7080.229980]  ? set_kthread_struct+0x50/0x50
[ 7080.229984]  ret_from_fork+0x22/0x30

I've tested this on (all x64_64):

  • Proxmox VE 7 (Debian Bullseye with zfs-2.0.5-pve1, 5.11.22-4-pve)
  • Stock Debian Bullseye (zfs-2.0.3-9, 5.10.0-8-amd64)
  • Stock Ubuntu 20.04 LTS (zfs-2.0.2-1ubuntu5.1, 5.11.0-36-generic)
  • FreeBSD 13.0-RELEASE-p4 (zfs-2.0.0-FreeBSD_gf11b09dec)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions