kernel: PANIC at zio.c:5150:zio_encrypt() when USB drive detaches and reattaches during encrypted transfer

I'm trying to archive several ZFS datasets to an external HDD connected via USB, using raw transfers to preserve the original encryption.  The first dataset was 252G and completed without issue.  The second is 1.6T; I've tried the transfer twice so far and each time it stalled out after transferring between 300-400G of data.  From the `dmesg` log, I can see that each time the USB device spontaneously detached itself from the bus, and then reattached itself about one second later.  I haven't yet figured out the cause of the detachment--- low-level driver behavior when it falls too far behind on queued writes?  Some kind of power-saving setting?  A periodic task that's causing a USB rescan?

Regardless of the reason, it seems like ZFS (especially with `recv -s`) ought to be able to recover from these kinds of interruptions.  When I run `zpool status` after the reattachment, I get

```
 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC
```

Running `zpool clear` takes a couple of seconds, and then the pool returns to ONLINE status, with the previously transferred datasets auto-remounted (as expected).  I also get the entry `WARNING: Pool 'lacie_backup1' was suspended and is being resumed. Failed I/O will be retried.` in my `dmesg` log, and my `zfs recv` command (which I'm monitoring with `pv`) resumes progress and returns to the original transfer rate.  However, about 20 seconds after the resume, `syslogd` reports a kernel panic, with the stack trace shown below.

Interestingly, the `zfs send` / `zfs recv` processes don't directly report any errors, and the transfer appears to keep moving along.  The first time this happened, I terminated the transfer, resilvered the pool, and restarted the transfer from scratch.  Now that it's happened a second time, I've decided to let it continue and double-check dataset integrity at the end.  I don't have any visible errors on any of my pools at the moment.

Searching for `spa_do_crypt_abd` turns up a slew of older issues with encrypted transfers, most of which were reported as resolved years ago.  The issue I'm encountering might be similar to #10570 or #12931, both of which were closed as unreproducible.

I certainly don't blame ZFS for the USB device detachment, and it's entirely possible that this hardware (LaCie Rugged) simply can't handle sustained transfers, but it seems like there's room for improvement in how gracefully ZFS recovers from these situations.  For example, it might be prudent for `zio_encrypt` to retry an interrupted `pa_do_crypt_abd` operation before throwing the panic.  I won't know for a few hours whether data integrity was preserved on the destination dataset.

### System information
Distribution Name	| Slackware Linux
Distribution Version | slackware64-current
Kernel Version	| 6.19.9
Architecture	| x86_64
OpenZFS Version	| 2.4.1
The external USB drive is the only device in its zpool.

### Describe the problem you're observing

Kernel panic even after ZFS reports that the transfer `is being resumed.  Failed I/O will be retried`.  Furthermore, `zfs recv` doesn't seem to detect any error condition despite the kernel panic, and continues the transfer.  It's not clear whether this is the expected behavior when the `-s` flag is passed.

### Describe how to reproduce the problem

1. Create a single-device zpool on an external USB drive.
2. `zfs send -w main_pool/dataset | pw | zfs recv -s usb_pool/new_dataset_name`
3. Wait until the transfer pauses.
4. Check `zpool status` and/or `zfs list` to see that that `usb_pool` is offline.
5. Run `zpool clear usb_pool`
6. Observe `usb_pool` come back online, and the `zfs send | zfs recv` resumes.
7. Wait until `syslogd` reports a kernel panic.

### Include any warning/errors/backtraces from the system logs

```
[77746.760127] WARNING: Pool 'lacie_backup1' was suspended and is being resumed. Failed I/O will be retried.
[77804.857858] VERIFY0(spa_do_crypt_abd(B_TRUE, spa, &zio->io_bookmark, BP_GET_TYPE(bp), BP_GET_DEDUP(bp), BP_SHOULD_BYTESWAP(bp), salt, iv, mac, psize, zio->io_abd, eabd, &no_crypt)) failed (13)
[77804.857863] PANIC at zio.c:5150:zio_encrypt()
[77804.857865] Showing stack for process 10956
[77804.857869] CPU: 5 UID: 0 PID: 10956 Comm: z_wr_iss Tainted: G           O        6.19.9 #1 PREEMPT(full) 
[77804.857872] Tainted: [O]=OOT_MODULE
[77804.857872] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS MAX (MS-7B79), BIOS H.L0 04/14/2025
[77804.857873] Call Trace:
[77804.857875]  <TASK>
[77804.857877]  dump_stack_lvl+0x47/0x60
[77804.857882]  spl_panic+0xcd/0xf2
[77804.857886]  zio_encrypt+0x7aa/0x830
[77804.857889]  ? zio_write_compress+0x6a8/0x8f0
[77804.857891]  zio_execute+0x86/0x120
[77804.857893]  ? __wake_up+0x40/0x50
[77804.857896]  taskq_thread+0x2e5/0x710
[77804.857900]  ? wake_up_state+0x10/0x10
[77804.857903]  ? zio_rewrite_gang+0x1d0/0x1d0
[77804.857905]  ? taskq_thread_spawn+0x70/0x70
[77804.857906]  kthread+0x104/0x200
[77804.857909]  ? _raw_spin_unlock+0x12/0x30
[77804.857911]  ? finish_task_switch.isra.0+0x92/0x270
[77804.857914]  ? kthreads_online_cpu+0x110/0x110
[77804.857915]  ? kthreads_online_cpu+0x110/0x110
[77804.857917]  ret_from_fork+0x1ae/0x1f0
[77804.857919]  ? kthreads_online_cpu+0x110/0x110
[77804.857922]  ret_from_fork_asm+0x11/0x20
[77804.857925]  </TASK>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: PANIC at zio.c:5150:zio_encrypt() when USB drive detaches and reattaches during encrypted transfer #18394

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kernel: PANIC at zio.c:5150:zio_encrypt() when USB drive detaches and reattaches during encrypted transfer #18394

Description

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions