Skip to content

kernel: PANIC at zio.c:5150:zio_encrypt() when USB drive detaches and reattaches during encrypted transfer #18394

@spillner

Description

@spillner

I'm trying to archive several ZFS datasets to an external HDD connected via USB, using raw transfers to preserve the original encryption. The first dataset was 252G and completed without issue. The second is 1.6T; I've tried the transfer twice so far and each time it stalled out after transferring between 300-400G of data. From the dmesg log, I can see that each time the USB device spontaneously detached itself from the bus, and then reattached itself about one second later. I haven't yet figured out the cause of the detachment--- low-level driver behavior when it falls too far behind on queued writes? Some kind of power-saving setting? A periodic task that's causing a USB rescan?

Regardless of the reason, it seems like ZFS (especially with recv -s) ought to be able to recover from these kinds of interruptions. When I run zpool status after the reattachment, I get

 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC

Running zpool clear takes a couple of seconds, and then the pool returns to ONLINE status, with the previously transferred datasets auto-remounted (as expected). I also get the entry WARNING: Pool 'lacie_backup1' was suspended and is being resumed. Failed I/O will be retried. in my dmesg log, and my zfs recv command (which I'm monitoring with pv) resumes progress and returns to the original transfer rate. However, about 20 seconds after the resume, syslogd reports a kernel panic, with the stack trace shown below.

Interestingly, the zfs send / zfs recv processes don't directly report any errors, and the transfer appears to keep moving along. The first time this happened, I terminated the transfer, resilvered the pool, and restarted the transfer from scratch. Now that it's happened a second time, I've decided to let it continue and double-check dataset integrity at the end. I don't have any visible errors on any of my pools at the moment.

Searching for spa_do_crypt_abd turns up a slew of older issues with encrypted transfers, most of which were reported as resolved years ago. The issue I'm encountering might be similar to #10570 or #12931, both of which were closed as unreproducible.

I certainly don't blame ZFS for the USB device detachment, and it's entirely possible that this hardware (LaCie Rugged) simply can't handle sustained transfers, but it seems like there's room for improvement in how gracefully ZFS recovers from these situations. For example, it might be prudent for zio_encrypt to retry an interrupted pa_do_crypt_abd operation before throwing the panic. I won't know for a few hours whether data integrity was preserved on the destination dataset.

System information

Distribution Name | Slackware Linux
Distribution Version | slackware64-current
Kernel Version | 6.19.9
Architecture | x86_64
OpenZFS Version | 2.4.1
The external USB drive is the only device in its zpool.

Describe the problem you're observing

Kernel panic even after ZFS reports that the transfer is being resumed. Failed I/O will be retried. Furthermore, zfs recv doesn't seem to detect any error condition despite the kernel panic, and continues the transfer. It's not clear whether this is the expected behavior when the -s flag is passed.

Describe how to reproduce the problem

  1. Create a single-device zpool on an external USB drive.
  2. zfs send -w main_pool/dataset | pw | zfs recv -s usb_pool/new_dataset_name
  3. Wait until the transfer pauses.
  4. Check zpool status and/or zfs list to see that that usb_pool is offline.
  5. Run zpool clear usb_pool
  6. Observe usb_pool come back online, and the zfs send | zfs recv resumes.
  7. Wait until syslogd reports a kernel panic.

Include any warning/errors/backtraces from the system logs

[77746.760127] WARNING: Pool 'lacie_backup1' was suspended and is being resumed. Failed I/O will be retried.
[77804.857858] VERIFY0(spa_do_crypt_abd(B_TRUE, spa, &zio->io_bookmark, BP_GET_TYPE(bp), BP_GET_DEDUP(bp), BP_SHOULD_BYTESWAP(bp), salt, iv, mac, psize, zio->io_abd, eabd, &no_crypt)) failed (13)
[77804.857863] PANIC at zio.c:5150:zio_encrypt()
[77804.857865] Showing stack for process 10956
[77804.857869] CPU: 5 UID: 0 PID: 10956 Comm: z_wr_iss Tainted: G           O        6.19.9 #1 PREEMPT(full) 
[77804.857872] Tainted: [O]=OOT_MODULE
[77804.857872] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS MAX (MS-7B79), BIOS H.L0 04/14/2025
[77804.857873] Call Trace:
[77804.857875]  <TASK>
[77804.857877]  dump_stack_lvl+0x47/0x60
[77804.857882]  spl_panic+0xcd/0xf2
[77804.857886]  zio_encrypt+0x7aa/0x830
[77804.857889]  ? zio_write_compress+0x6a8/0x8f0
[77804.857891]  zio_execute+0x86/0x120
[77804.857893]  ? __wake_up+0x40/0x50
[77804.857896]  taskq_thread+0x2e5/0x710
[77804.857900]  ? wake_up_state+0x10/0x10
[77804.857903]  ? zio_rewrite_gang+0x1d0/0x1d0
[77804.857905]  ? taskq_thread_spawn+0x70/0x70
[77804.857906]  kthread+0x104/0x200
[77804.857909]  ? _raw_spin_unlock+0x12/0x30
[77804.857911]  ? finish_task_switch.isra.0+0x92/0x270
[77804.857914]  ? kthreads_online_cpu+0x110/0x110
[77804.857915]  ? kthreads_online_cpu+0x110/0x110
[77804.857917]  ret_from_fork+0x1ae/0x1f0
[77804.857919]  ? kthreads_online_cpu+0x110/0x110
[77804.857922]  ret_from_fork_asm+0x11/0x20
[77804.857925]  </TASK>

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions