Skip to content

perf: Native batch passthrough for native_iceberg_compat V1 scans [EXPERIMENTAL]#3410

Closed
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:native-batch-passthrough
Closed

perf: Native batch passthrough for native_iceberg_compat V1 scans [EXPERIMENTAL]#3410
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:native-batch-passthrough

Conversation

@andygrove
Copy link
Copy Markdown
Member

Summary

  • Eliminates the JVM data round trip for data columns in native_iceberg_compat V1 scans
  • Data columns are read directly from the native BatchContext via zero-copy Arc::clone
  • Only partition columns (small, constant values) cross the JVM boundary via Arrow FFI
  • Reduces 3 copy steps to 1 for data columns (the currentColumnBatch JNI export remains; the exportBatch FFI round-trip and copy_array deep copy are eliminated)

How it works

When native_batch_passthrough is enabled in the Scan protobuf (auto-detected for native_iceberg_compat CometScanExec):

  1. NativeBatchReader.nextBatch() reads the batch natively and sets a ThreadLocal handle
  2. Rust ScanExec.get_next_passthrough() calls CometBatchIterator.advancePassthrough() instead of the normal hasNext()+next() path
  3. Data columns are obtained via Arc::clone from BatchContext.current_batch (zero-copy)
  4. Only partition columns are imported from JVM via FFI and deep-copied (they are small constant values)

Files changed

  • operator.proto: Added native_batch_passthrough and num_data_columns fields to Scan message
  • NativeBatchReader.java: Added CURRENT_READER_HANDLE ThreadLocal, set after each loadNextBatch()
  • CometBatchIterator.java: Added advancePassthrough() and nextPartitionColumnsOnly() methods
  • batch_iterator.rs: JNI method bindings for the new Java methods
  • scan.rs: Added get_next_passthrough() that reads data cols from BatchContext (zero-copy)
  • planner.rs: Passes new fields to ScanExec::new()
  • CometSink.scala: Detects native_iceberg_compat scans and sets passthrough fields
  • mod.rs: Made BatchContext and get_batch_context public

Test plan

  • ParquetReadV1Suite - all 88 tests pass
  • ParquetReadV2Suite - all tests pass
  • Partition-specific tests pass (6/6)
  • Run benchmark to measure performance improvement

🤖 Generated with Claude Code

@andygrove andygrove changed the title feat: Native batch passthrough for native_iceberg_compat V1 scans feat: Native batch passthrough for native_iceberg_compat V1 scans [EXPERIMENTAL] Feb 5, 2026
* nextBatch() in passthrough mode so that CometBatchIterator.advancePassthrough() can retrieve
* it.
*/
public static final ThreadLocal<Long> CURRENT_READER_HANDLE = ThreadLocal.withInitial(() -> 0L);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude really loves thread local variables and I always need to sit and think for a long time how it could blow up in our faces.

Eliminate the JVM data round trip for data columns in native_iceberg_compat
scans. Data columns are read directly from the native BatchContext via
zero-copy Arc::clone, while only partition columns cross the JVM boundary
via Arrow FFI.

Previously, data made a wasteful round trip:
  Rust ParquetSource → per-column JNI export to JVM → JVM wraps as
  CometVector → JVM exports ALL cols back to Rust via Arrow FFI →
  Rust ScanExec deep-copies every column

Now in passthrough mode:
  Rust ParquetSource → batch stays in native BatchContext →
  Rust ScanExec reads data cols directly (zero-copy) →
  Only partition cols imported from JVM FFI (small, constant)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove force-pushed the native-batch-passthrough branch from 75750af to 57ea80b Compare February 5, 2026 17:57
@andygrove andygrove changed the title feat: Native batch passthrough for native_iceberg_compat V1 scans [EXPERIMENTAL] perf: Native batch passthrough for native_iceberg_compat V1 scans [EXPERIMENTAL] Feb 5, 2026
@andygrove andygrove closed this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants