Skip to content

feat: update to DataFusion 51, arrow 57, delta-kernel 0.18.0, pyo3 26, pyo3-arrow 0.14#3949

Merged
rtyler merged 39 commits intodelta-io:mainfrom
hntd187:df-51-2
Dec 2, 2025
Merged

feat: update to DataFusion 51, arrow 57, delta-kernel 0.18.0, pyo3 26, pyo3-arrow 0.14#3949
rtyler merged 39 commits intodelta-io:mainfrom
hntd187:df-51-2

Conversation

@hntd187
Copy link
Copy Markdown
Collaborator

@hntd187 hntd187 commented Nov 24, 2025

Description

A continuation from #3933

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Nov 24, 2025
DrakeLin and others added 13 commits November 24, 2025 13:55
Signed-off-by: DrakeLin <drakelin18@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
# Description

As of today, the Rust meta-crate re-exports the storage crates but never
calls their register_handlers helpers. That means every Rust binary
still has to remember to call deltalake::gcp::register_handlers(None)
(and the equivalents for S3/Azure/etc.) before using cloud URIs, even
though the Python bindings auto-register. This PR brings the meta-crate
to parity so DeltaOps::try_from_uri("gs://…") works out of the box when
the gcs feature is enabled.

# Problem

- Users of the deltalake crate must manually register each storage
backend before working with gs://, s3://, abfss://, etc.

- Forgetting the call leads to
DeltaTableError::InvalidTableLocation("Unknown scheme: gs"), which
blocks workflows like DataFusion writers on GCS.

- Docs/examples didn’t make it obvious when manual registration was
still required.

# Solution

- Add feature-gated ctor hooks in crates/deltalake/src/lib.rs that call
register_handlers(None) for AWS, Azure, GCS, HDFS, LakeFS, and Unity as
soon as their features are enabled.

- Pull in the lightweight ctor = "0.2" dependency so the hooks run at
startup.

- Add a small regression test that exercises
DeltaTableBuilder::from_uri("gs://…") with the gcs feature to guard
against regressions.

- Update the GCS integration docs and changelog to explain that the
meta-crate now auto-registers backends while deltalake-core users still
need to call the storage crates explicitly.

# Changes

- crates/deltalake/src/lib.rs: new #[ctor::ctor] modules for s3, azure,
gcs, hdfs, lakefs, and unity.

- crates/deltalake/Cargo.toml: add ctor dependency.

- crates/deltalake/tests/gcs_auto_registration.rs: new smoke tests for
gs:// URI recognition when the gcs feature is enabled.

- docs/integrations/object-storage/gcs.md & CHANGELOG.md: document the
auto-registration behavior.

# Testing

- cargo check -p deltalake --all-features
- cargo test -p deltalake --features gcs
- cargo test --test gcs_auto_registration --features gcs
- cargo build --example pharma_pipeline_gcs --features gcs,datafusion

# Documentation

- docs/integrations/object-storage/gcs.md
- CHANGELOG.md

---------

Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
…n and Rust

Profiles for different use cases:
- dev: Fast local development with good debug experience (Cargo defaults)
- test: test builds with minimal debug info to save disk/RAM (custom)
- release: Production Rust crates - maximum performance (Cargo defaults)
- bench: For benchmarking with flamegraphs (cargo bench)
- profiling: For performance profiling with release opts + debug info
- ci: CI/CD optimized - fast builds, release-like performance
- python-release: Python wheel builds - portable, reproducible (PyPI releases)

Signed-off-by: Florian Valeye <florian.valeye@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Florian Valeye <florian.valeye@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
…om_uri

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
---
updated-dependencies:
- dependency-name: ctor
  dependency-version: 0.6.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
This work was really cool when @houqp explored it. In 2025 we're not
really reliant on dynamodb locking and at some point in the distant
future maybe we'll not need a DynamoDBLogStore either.

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
---
updated-dependencies:
- dependency-name: convert_case
  dependency-version: 0.9.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
lonless9 and others added 10 commits November 24, 2025 13:55
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
- Fix panic when get_add_actions() is called on tables with no add actions
- Return empty RecordBatch with correct schema instead of panicking
- Add unit test to verify get_add_actions() works after delete and vacuum

Fixes delta-io#3918

Signed-off-by: Manish Sogiyawar <msogiyawar@vectra.ai>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Manish Sogiyawar <msogiyawar@vectra.ai>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
See delta-io#3918

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
… overwrite (delta-io#3912)

# Description
Remove references in docs that suggest using `partition_filters` for
selectively overwriting partitions, which has been removed from the
`write_deltalake` API.

# Related Issue(s)
fixes delta-io#3904

Signed-off-by: zyd14 <romerzs14@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Fixes delta-io#3886

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
@hntd187 hntd187 self-assigned this Nov 24, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 84.66899% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.34%. Comparing base (948396f) to head (51cd795).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/kernel/models/actions.rs 60.00% 15 Missing and 1 partial ⚠️
crates/core/src/delta_datafusion/engine/storage.rs 0.00% 6 Missing ⚠️
crates/core/src/kernel/transaction/protocol.rs 91.66% 5 Missing ⚠️
...tes/core/src/kernel/snapshot/iterators/scan_row.rs 93.54% 0 Missing and 4 partials ⚠️
crates/core/src/kernel/arrow/engine_ext.rs 81.25% 1 Missing and 2 partials ⚠️
...s/core/src/delta_datafusion/engine/file_formats.rs 0.00% 1 Missing ⚠️
crates/core/src/delta_datafusion/mod.rs 80.00% 1 Missing ⚠️
crates/core/src/kernel/snapshot/log_data.rs 92.85% 0 Missing and 1 partial ⚠️
crates/core/src/operations/constraints.rs 0.00% 1 Missing ⚠️
crates/core/src/operations/load_cdf.rs 0.00% 0 Missing and 1 partial ⚠️
... and 5 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3949       +/-   ##
===========================================
+ Coverage   26.24%   74.34%   +48.10%     
===========================================
  Files         124      152       +28     
  Lines       19824    39608    +19784     
  Branches    19824    39608    +19784     
===========================================
+ Hits         5203    29448    +24245     
+ Misses      14251     8823     -5428     
- Partials      370     1337      +967     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Stephen Carman <shcarman@gmail.com>
# Conflicts:
#	crates/core/src/kernel/models/actions.rs
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Stephen Carman <shcarman@gmail.com>
@hntd187 hntd187 changed the title feat: update to DataFusion 51, arrow 57, delta-kernel 0.17.0, pyo3 26, pyo3-arrow 0.14 feat: update to DataFusion 51, arrow 57, delta-kernel 0.18.0, pyo3 26, pyo3-arrow 0.14 Nov 25, 2025
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
@rtyler rtyler enabled auto-merge (rebase) November 29, 2025 16:57
@rtyler rtyler disabled auto-merge December 2, 2025 13:36
@rtyler rtyler merged commit 49f089d into delta-io:main Dec 2, 2025
28 of 29 checks passed
ethan-tyler added a commit to ethan-tyler/delta-rs that referenced this pull request Jan 9, 2026
…, pyo3-arrow 0.14 (delta-io#3949)

# Description
A continuation from delta-io#3933

---------

Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: DrakeLin <drakelin18@gmail.com>
Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Signed-off-by: Florian Valeye <florian.valeye@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Manish Sogiyawar <msogiyawar@vectra.ai>
Signed-off-by: zyd14 <romerzs14@gmail.com>
Co-authored-by: xiaolong <xiaolong@lakesail.com>
Co-authored-by: DrakeLin <drakelin18@gmail.com>
Co-authored-by: Ethan Urbanski <ethan@urbanskitech.com>
Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
Co-authored-by: Florian Valeye <florian.valeye@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Manish Sogiyawar <msogiyawar@vectra.ai>
Co-authored-by: zyd14 <romerzs14@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/python Issues for the Python package binding/rust Issues for the Rust crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants