Skip to content

Commit 35120da

Browse files
committed
SMB: Stream uploads, bound copy memory
- `SmbVolume::import_single_file_with_progress` used to `std::fs::read` the full local source into a `Vec<u8>` before writing. An 8 GB local→NAS copy meant 8 GB of RAM. Now opens the source with `tokio::fs::File::open` and drives smb2's `FileWriter` chunk-by-chunk (~1 MiB buffer). - `SmbVolume::write_from_stream` used to drain the incoming `VolumeReadStream` into `all_data: Vec<u8>` before a single `write_file_pipelined`. Now pulls from `stream.next_chunk().await` and pushes each chunk straight into `FileWriter`. The old "don't hold the smb mutex across source reads" concern was overblown — different volumes have different mutexes, and `export_single_file_with_progress` already holds the lock for the read duration. - Both paths now report `VolumeError::Cancelled` on `ControlFlow::Break(())` (matching the shared cancel-as-Cancelled rule) and finalize the writer to drain in-flight pipelined WRITE responses before deleting the partial file, so the session stays usable for subsequent ops. - Added a `# Streaming requirement` section to the `export_to_local`, `import_from_local`, `open_read_stream`, `write_from_stream` doc comments on the `Volume` trait, calling out the ~1 MiB peak-RAM budget and the anti-pattern of pre-buffering. - Updated `volume/CLAUDE.md`: flipped the `write_from_stream` row to ✅, amended `import_from_local` to note streaming in both directions, rewrote the "Anti-pattern" section, and added a tier 2 checklist bullet pointing at the new trait doc sections. - Added three Docker-integration tests: `smb_integration_import_from_local_streams_large_file`, `smb_integration_write_from_stream_streams_large_file`, `smb_integration_import_from_local_cancel_mid_write`. Existing streaming integration tests still pass. - Bumped the file-length allowlist for `smb.rs` (now 2670).
1 parent 33300a4 commit 35120da

4 files changed

Lines changed: 306 additions & 39 deletions

File tree

apps/desktop/src-tauri/src/file_system/volume/CLAUDE.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ Everything below is optional per the trait (methods default to `Err(NotSupported
7676
- [ ] Return `supports_export() = true` and implement `export_to_local` + `import_from_local`. These are what the Copy dialog uses for "this volume ↔ local" transfers.
7777
- [ ] Implement `scan_for_copy` (count + bytes) and `scan_for_conflicts` (destination collision detection). These feed the Copy dialog's pre-flight.
7878
- [ ] Map your backend's errors through a `map_*_error` function that returns `VolumeError`. Connection-loss errors should trigger a state transition (see `SmbVolume::handle_smb_result` as a reference) so subsequent calls fail fast.
79+
- [ ] **No full-file buffering in per-file transfer paths.** Don't `std::fs::read` the local source, don't drain the incoming `VolumeReadStream` into a `Vec<u8>`, and don't collect the remote file into a `Vec<u8>` before writing to local. An 8 GB copy would allocate 8 GB of RAM. See the "Streaming requirement" section on each of these trait methods' doc comments: `export_to_local`, `import_from_local`, `open_read_stream`, `write_from_stream`.
7980

8081
### Tier 3 — integrate with the wider app (optional, but mostly expected)
8182

@@ -104,10 +105,10 @@ At-a-glance view of which capabilities each current volume opts into. Use this w
104105
| `list_directory` / metadata |||||
105106
| Mutations (create/delete/rename) |||||
106107
| `supports_export` |||||
107-
| `export_to_local` / `import_from_local` ||| ✅ streaming | |
108+
| `export_to_local` / `import_from_local` ||| ✅ streaming (both directions) ||
108109
| `supports_streaming` | ❌ (no need) ||||
109110
| `open_read_stream` || ✅ owned download | ✅ channel-backed | ✅ in-memory |
110-
| `write_from_stream` || ✅ streaming | ⚠️ pre-buffers (TODO) | ✅ in-memory |
111+
| `write_from_stream` || ✅ streaming | ✅ streaming | ✅ in-memory |
111112
| `supports_watching` | ✅ FSEvents/inotify | ❌ (own USB watcher) | ❌ (OS-mount FSEvents) ||
112113
| `supports_local_fs_access` | ✅ (default) ||||
113114
| `local_path` |`Some(root)` | `None` | `None` | `None` |
@@ -154,7 +155,7 @@ Key building blocks:
154155

155156
The pre-refactor `SmbReadStream` read the entire file into a `Vec<u8>` via `read_file_pipelined` and yielded slices. For an 8 GB file that meant an 8 GB allocation. Don't do this. If the consumer API is stream-shaped, the producer should stream too.
156157

157-
`SmbVolume::write_from_stream` still pre-buffers the source into `all_data` before writing to SMB (comment at the top of the function explains the reasoning at the time). That's the `⚠️` in the capability matrix above — a follow-up to stream the write side in chunks while holding the SMB session mutex for the duration.
158+
The same rule applies to write paths: `import_from_local` and `write_from_stream` must drive the backend's chunk-by-chunk writer (for example, smb2's `FileWriter`) rather than slurping the source into a `Vec<u8>` first. The SMB write paths used to pre-buffer; they now stream. See the "Streaming requirement" section on each Volume trait method's doc comment.
158159

159160
## Path handling gotchas
160161

apps/desktop/src-tauri/src/file_system/volume/mod.rs

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -508,6 +508,18 @@ pub trait Volume: Send + Sync {
508508
/// `on_progress(bytes_done, bytes_total)` is called periodically during the transfer.
509509
/// Return `ControlFlow::Break(())` from the callback to cancel the transfer.
510510
/// Returns bytes transferred.
511+
///
512+
/// # Streaming requirement
513+
///
514+
/// **Must stream.** Don't accumulate the remote file into a `Vec<u8>`
515+
/// before touching the local disk. A user copying an 8 GB file off a
516+
/// NAS would allocate 8 GB of RAM. Drive the backend's streaming reader
517+
/// (smb2: `FileDownload`, mtp-rs: `FileDownload`) chunk-by-chunk and
518+
/// write each chunk to `local_dest` as it arrives. For local volumes,
519+
/// reach for the OS's native copy APIs which stream by design.
520+
///
521+
/// Peak memory per transfer should be bounded by a small chunk buffer
522+
/// (~1 MiB) regardless of file size.
511523
fn export_to_local<'a>(
512524
&'a self,
513525
source: &'a Path,
@@ -524,6 +536,19 @@ pub trait Volume: Send + Sync {
524536
/// `on_progress(bytes_done, bytes_total)` is called periodically during the transfer.
525537
/// Return `ControlFlow::Break(())` from the callback to cancel the transfer.
526538
/// Returns bytes transferred.
539+
///
540+
/// # Streaming requirement
541+
///
542+
/// **Must stream.** Don't `std::fs::read(path)` / `tokio::fs::read(path)`
543+
/// the local source into a `Vec<u8>` before sending it off. A user
544+
/// copying an 8 GB file onto a NAS would allocate 8 GB of RAM. Open the
545+
/// source with `tokio::fs::File::open` and drive the backend's streaming
546+
/// writer (smb2: `FileWriter`, mtp-rs: `upload_stream`) chunk-by-chunk.
547+
/// For local volumes, reach for the OS's native copy APIs which stream
548+
/// by design.
549+
///
550+
/// Peak memory per transfer should be bounded by a small chunk buffer
551+
/// (~1 MiB) regardless of file size.
527552
fn import_from_local<'a>(
528553
&'a self,
529554
local_source: &'a Path,
@@ -580,6 +605,20 @@ pub trait Volume: Send + Sync {
580605
///
581606
/// Returns a VolumeReadStream that yields chunks of data.
582607
/// The stream must be fully consumed or dropped before other operations.
608+
///
609+
/// # Streaming requirement
610+
///
611+
/// **Must stream.** Don't read the whole file into a `Vec<u8>` inside
612+
/// this method and hand chunks of it back — that's just pre-buffering
613+
/// with extra steps. A user streaming an 8 GB file would allocate 8 GB
614+
/// of RAM before the consumer sees a single byte. Drive the backend's
615+
/// streaming reader (smb2: `FileDownload`, mtp-rs: `FileDownload`) on
616+
/// demand from `next_chunk`. If the backend gives you a borrowed
617+
/// handle, use a bounded producer/consumer channel (see `SmbReadStream`
618+
/// for the pattern).
619+
///
620+
/// Peak memory per transfer should be bounded by a small chunk buffer
621+
/// (~1 MiB) regardless of file size.
583622
#[allow(
584623
clippy::type_complexity,
585624
reason = "async trait method returns a pinned boxed future by design"
@@ -602,6 +641,20 @@ pub trait Volume: Send + Sync {
602641
/// * `size` - Total size in bytes (required for protocols like MTP)
603642
/// * `stream` - Source data stream
604643
/// * `on_progress` - Progress callback; return `ControlFlow::Break(())` to cancel
644+
///
645+
/// # Streaming requirement
646+
///
647+
/// **Must stream.** Don't drain `stream` into a `Vec<u8>` before writing
648+
/// to the backend. A user copying an 8 GB file through this path would
649+
/// allocate 8 GB of RAM. Pull each chunk from `stream.next_chunk().await`
650+
/// and push it straight into the backend's streaming writer (smb2:
651+
/// `FileWriter`, mtp-rs: `upload_stream`) in the same loop. Holding the
652+
/// backend's session mutex across the source `next_chunk` awaits is
653+
/// fine — different volumes use different mutexes, so there's no
654+
/// deadlock risk.
655+
///
656+
/// Peak memory per transfer should be bounded by a small chunk buffer
657+
/// (~1 MiB) regardless of file size.
605658
fn write_from_stream<'a>(
606659
&'a self,
607660
dest: &'a Path,

0 commit comments

Comments
 (0)