-
Notifications
You must be signed in to change notification settings - Fork 2k
kernel: PANIC at zio.c:5150:zio_encrypt() when USB drive detaches and reattaches during encrypted transfer #18394
Description
I'm trying to archive several ZFS datasets to an external HDD connected via USB, using raw transfers to preserve the original encryption. The first dataset was 252G and completed without issue. The second is 1.6T; I've tried the transfer twice so far and each time it stalled out after transferring between 300-400G of data. From the dmesg log, I can see that each time the USB device spontaneously detached itself from the bus, and then reattached itself about one second later. I haven't yet figured out the cause of the detachment--- low-level driver behavior when it falls too far behind on queued writes? Some kind of power-saving setting? A periodic task that's causing a USB rescan?
Regardless of the reason, it seems like ZFS (especially with recv -s) ought to be able to recover from these kinds of interruptions. When I run zpool status after the reattachment, I get
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
Running zpool clear takes a couple of seconds, and then the pool returns to ONLINE status, with the previously transferred datasets auto-remounted (as expected). I also get the entry WARNING: Pool 'lacie_backup1' was suspended and is being resumed. Failed I/O will be retried. in my dmesg log, and my zfs recv command (which I'm monitoring with pv) resumes progress and returns to the original transfer rate. However, about 20 seconds after the resume, syslogd reports a kernel panic, with the stack trace shown below.
Interestingly, the zfs send / zfs recv processes don't directly report any errors, and the transfer appears to keep moving along. The first time this happened, I terminated the transfer, resilvered the pool, and restarted the transfer from scratch. Now that it's happened a second time, I've decided to let it continue and double-check dataset integrity at the end. I don't have any visible errors on any of my pools at the moment.
Searching for spa_do_crypt_abd turns up a slew of older issues with encrypted transfers, most of which were reported as resolved years ago. The issue I'm encountering might be similar to #10570 or #12931, both of which were closed as unreproducible.
I certainly don't blame ZFS for the USB device detachment, and it's entirely possible that this hardware (LaCie Rugged) simply can't handle sustained transfers, but it seems like there's room for improvement in how gracefully ZFS recovers from these situations. For example, it might be prudent for zio_encrypt to retry an interrupted pa_do_crypt_abd operation before throwing the panic. I won't know for a few hours whether data integrity was preserved on the destination dataset.
System information
Distribution Name | Slackware Linux
Distribution Version | slackware64-current
Kernel Version | 6.19.9
Architecture | x86_64
OpenZFS Version | 2.4.1
The external USB drive is the only device in its zpool.
Describe the problem you're observing
Kernel panic even after ZFS reports that the transfer is being resumed. Failed I/O will be retried. Furthermore, zfs recv doesn't seem to detect any error condition despite the kernel panic, and continues the transfer. It's not clear whether this is the expected behavior when the -s flag is passed.
Describe how to reproduce the problem
- Create a single-device zpool on an external USB drive.
zfs send -w main_pool/dataset | pw | zfs recv -s usb_pool/new_dataset_name- Wait until the transfer pauses.
- Check
zpool statusand/orzfs listto see that thatusb_poolis offline. - Run
zpool clear usb_pool - Observe
usb_poolcome back online, and thezfs send | zfs recvresumes. - Wait until
syslogdreports a kernel panic.
Include any warning/errors/backtraces from the system logs
[77746.760127] WARNING: Pool 'lacie_backup1' was suspended and is being resumed. Failed I/O will be retried.
[77804.857858] VERIFY0(spa_do_crypt_abd(B_TRUE, spa, &zio->io_bookmark, BP_GET_TYPE(bp), BP_GET_DEDUP(bp), BP_SHOULD_BYTESWAP(bp), salt, iv, mac, psize, zio->io_abd, eabd, &no_crypt)) failed (13)
[77804.857863] PANIC at zio.c:5150:zio_encrypt()
[77804.857865] Showing stack for process 10956
[77804.857869] CPU: 5 UID: 0 PID: 10956 Comm: z_wr_iss Tainted: G O 6.19.9 #1 PREEMPT(full)
[77804.857872] Tainted: [O]=OOT_MODULE
[77804.857872] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS MAX (MS-7B79), BIOS H.L0 04/14/2025
[77804.857873] Call Trace:
[77804.857875] <TASK>
[77804.857877] dump_stack_lvl+0x47/0x60
[77804.857882] spl_panic+0xcd/0xf2
[77804.857886] zio_encrypt+0x7aa/0x830
[77804.857889] ? zio_write_compress+0x6a8/0x8f0
[77804.857891] zio_execute+0x86/0x120
[77804.857893] ? __wake_up+0x40/0x50
[77804.857896] taskq_thread+0x2e5/0x710
[77804.857900] ? wake_up_state+0x10/0x10
[77804.857903] ? zio_rewrite_gang+0x1d0/0x1d0
[77804.857905] ? taskq_thread_spawn+0x70/0x70
[77804.857906] kthread+0x104/0x200
[77804.857909] ? _raw_spin_unlock+0x12/0x30
[77804.857911] ? finish_task_switch.isra.0+0x92/0x270
[77804.857914] ? kthreads_online_cpu+0x110/0x110
[77804.857915] ? kthreads_online_cpu+0x110/0x110
[77804.857917] ret_from_fork+0x1ae/0x1f0
[77804.857919] ? kthreads_online_cpu+0x110/0x110
[77804.857922] ret_from_fork_asm+0x11/0x20
[77804.857925] </TASK>