This release consists of 253 commits from 71 contributors. See credits at the end of this changelog for more information.
See the upgrade guide for information on how to upgrade from previous versions.
Breaking changes:
- feat: add metadata to literal expressions #16170 (timsaucer)
- [MAJOR] Equivalence System Overhaul #16217 (ozankabak)
- remove unused methods in SortExec #16457 (adriangb)
- Move Pruning Logic to a Dedicated datafusion-pruning Crate for Improved Modularity #16549 (kosiew)
- Fix type of ExecutionOptions::time_zone #16569 (findepi)
- Convert Option<Vec> to Vec #16615 (ViggoC)
- Refactor error handling to use boxed errors for DataFusionError variants #16672 (kosiew)
- Reuse Rows allocation in RowCursorStream #16647 (Dandandan)
- refactor: shrink
SchemaError#16653 (crepererum) - Remove unused AggregateUDF struct #16683 (ViggoC)
- Bump the MSRV to
1.85.1due to transitive dependencies (aws-sdk) #16728 (rtyler)
Performance related:
- Add late pruning of Parquet files based on file level statistics #16014 (adriangb)
- Add fast paths for try_process_unnest #16389 (simonvandel)
- Set the default value of
datafusion.execution.collect_statisticstotrue#16447 (AdamGS) - Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) #16509 (zhuqi-lucas)
- Simplify predicates in
PushDownFilteroptimizer rule #16362 (xudong963) - optimize
ScalarValue::to_array_of_sizefor structural types #16706 (ding-young) - Refactor filter pushdown APIs to enable joins to pass through filters #16732 (adriangb)
- perf: Optimize hash joins with an empty build side #16716 (nuno-faria)
- Per file filter evaluation #15057 (adriangb)
Implemented enhancements:
- feat: Support defining custom MetricValues in PhysicalPlans #16195 (sfluor)
- feat: Allow cancelling of grouping operations which are CPU bound #16196 (zhuqi-lucas)
- feat: support FixedSizeList for array_has #16333 (chenkovsky)
- feat: Support tpch and tpch10 benchmark for csv format #16373 (zhuqi-lucas)
- feat: Support RightMark join for NestedLoop and Hash join #16083 (jonathanc-n)
- feat: mapping sql Char/Text/String default to Utf8View #16290 (zhuqi-lucas)
- feat: support fixed size list for array reverse #16423 (chenkovsky)
- feat: add SchemaProvider::table_type(table_name: &str) #16401 (epgif)
- feat: derive
DebugandCloneforScalarFunctionArgs#16471 (crepererum) - feat: support
map_entriesbuiltin function #16557 (comphead) - feat: add
array_minscalar function and associated tests #16574 (dharanad) - feat: Finalize support for
RightMarkjoin +Markjoin swap #16488 (jonathanc-n) - feat: Parquet modular encryption #16351 (corwinjoy)
- feat: Support
u32indices forHashJoinExec#16434 (jonathanc-n) - feat: expose intersect distinct/except distinct in dataframe api #16578 (chenkovsky)
- feat: Add a configuration to make parquet encryption optional #16649 (corwinjoy)
Fixed bugs:
- fix: preserve null_equals_null flag in eliminate_cross_join rule #16356 (waynexia)
- fix: Fix SparkSha2 to be compliant with Spark response and add support for Int32 #16350 (rishvin)
- fix: Fixed error handling for
generate_series/range#16391 (jonathanc-n) - fix: Enable WASM compilation by making sqlparser's recursive-protection optional #16418 (jonmmease)
- fix: create file for empty stream #16342 (chenkovsky)
- fix: document and fix macro hygiene for
config_field!#16473 (crepererum) - fix: make
with_new_statea trait method forExecutionPlan#16469 (geoffreyclaude) - fix: column indices in FFI partition evaluator #16480 (timsaucer)
- fix: support within_group #16538 (chenkovsky)
- fix: disallow specify both order_by and within_group #16606 (watchingthewheelsgo)
- fix: format within_group error message #16613 (watchingthewheelsgo)
- fix: reserved keywords in qualified column names #16584 (crepererum)
- fix: support scalar function nested in get_field in Unparser #16610 (chenkovsky)
- fix: sqllogictest runner label condition mismatch #16633 (lliangyu-lin)
- fix: port arrow inline fast key fix to datafusion #16698 (zhuqi-lucas)
- fix: try to lower plain reserved functions to columns as well #16669 (crepererum)
- fix: Fix CI failing due to #16686 #16718 (jonathanc-n)
- fix: return NULL if any of the param to make_date is NULL #16759 (feniljain)
- fix: add
order_requirement&dist_requirementtoOutputRequirementExecdisplay #16726 (Loaki07) - fix: support nullable columns in pre-sorted data sources #16783 (crepererum)
- fix: The inconsistency between scalar and array on the cast decimal to timestamp #16539 (chenkovsky)
- fix: unit test for object_storage #16824 (chenkovsky)
- fix(docs): Update broken links to
TableProviderdocs #16830 (jcsherin)
Documentation updates:
- Minor: Add upgrade guide for
Expr::WindowFunction#16313 (alamb) - Fix
array_positionon empty list #16292 (Blizzara) - Fix: mark "Spilling (to disk) Joins" as supported in features #16343 (kosiew)
- Fix cp_solver doc formatting #16352 (xudong963)
- docs: Expand
MemoryPooldocs with related structs #16289 (2010YOUY01) - Support datafusion-cli access to public S3 buckets that do not require authentication #16300 (alamb)
- Document Table Constraint Enforcement Behavior in Custom Table Providers Guide #16340 (kosiew)
- doc: Add SQL examples for SEMI + ANTI Joins #16316 (jonathanc-n)
- [datafusion-spark] Example of using Spark compatible function library #16384 (alamb)
- Add note in upgrade guide about changes to
Expr::Scalarin 48.0.0 #16360 (alamb) - Update PMC management instructions to follow new ASF process #16417 (alamb)
- Add design process section to the docs #16397 (alamb)
- Unify Metadata Handing: use
FieldMetadatainExpr::AliasandExprSchemable#16320 (alamb) - TopK dynamic filter pushdown attempt 2 #15770 (adriangb)
- Update Roadmap documentation #16399 (alamb)
- doc: Add comments to clarify algorithm for
MarkJoins #16436 (jonathanc-n) - Add compression option to SpillManager #16268 (ding-young)
- Redirect user defined function webpage #16475 (alamb)
- Use Tokio's task budget consistently, better APIs to support task cancellation #16398 (pepijnve)
- doc: upgrade guide for new compression option for spill files #16472 (2010YOUY01)
- Introduce Async User Defined Functions #14837 (goldmedal)
- Minor: Add more links to cooperative / scheduling docs #16484 (alamb)
- doc: Document DESCRIBE comman in ddl.md #16524 (krikera)
- Add more doc for physical filter pushdown #16504 (xudong963)
- chore: fix CI failures on
ddl.md#16526 (comphead) - Add some comments about adding new dependencies in datafusion-sql #16543 (alamb)
- Add note for planning release in Upgrade Guides #16534 (xudong963)
- Consolidate configuration sections in docs #16544 (alamb)
- Minor: add clearer link to the main website from intro paragraph. #16556 (alamb)
- Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl #16523 (alamb)
- docs: Minor grammatical fixes for the scalar UDF docs #16618 (ianthetechie)
- Implementation for regex_instr #15928 (nirnayroy)
- Update Upgrade Guide for 48.0.1 #16699 (alamb)
- ensure MemTable has at least one partition #16754 (waynexia)
- Restore custom SchemaAdapter functionality for Parquet #16791 (adriangb)
- Update
upgrading.mdfor new unified config for sql string mapping to utf8view #16809 (zhuqi-lucas) - docs: Remove reference to forthcoming example (#16817) #16818 (m09526)
- docs: Fix broken links #16839 (2010YOUY01)
- Add note to upgrade guide about MSRV update #16845 (alamb)
Other:
- chore(deps): bump sqllogictest from 0.28.2 to 0.28.3 #16286 (dependabot[bot])
- chore(deps-dev): bump webpack-dev-server from 4.15.1 to 5.2.1 in /datafusion/wasmtest/datafusion-wasm-app #16253 (dependabot[bot])
- Improve DataFusion subcrate readme files #16263 (alamb)
- Fix intermittent SQL logic test failure in limit.slt by adding ORDER BY clause #16257 (kosiew)
- Extend benchmark comparison script with more detailed statistics #16262 (pepijnve)
- chore(deps): bump flate2 from 1.1.1 to 1.1.2 #16338 (dependabot[bot])
- chore(deps): bump petgraph from 0.8.1 to 0.8.2 #16337 (dependabot[bot])
- chore(deps): bump substrait from 0.56.0 to 0.57.0 #16143 (dependabot[bot])
- Add test for ordering of predicate pushdown into parquet #16169 (adriangb)
- Fix distinct count for DictionaryArray to correctly account for nulls in values array #16258 (kosiew)
- Fix inconsistent schema projection in ListingTable even when schema is specified #16305 (kosiew)
- tpch: move reading of SQL queries out of timed span. #16357 (pepijnve)
- chore(deps): bump clap from 4.5.39 to 4.5.40 #16354 (dependabot[bot])
- chore(deps): bump syn from 2.0.101 to 2.0.102 #16355 (dependabot[bot])
- Encapsulate metadata for literals on to a
FieldMetadatastructure #16317 (alamb) - Add support
UInt64and other integer data types forto_hex#16335 (tlm365) - Document
copy_array_datafunction with example #16361 (alamb) - Fix array_agg memory over use #16346 (gabotechs)
- Update publish command #16377 (xudong963)
- Add more context to error message for datafusion-cli config failure #16379 (alamb)
- Fix: datafusion-sqllogictest 48.0.0 can't be published #16376 (xudong963)
- bug: remove busy-wait while sort is ongoing #16322 (pepijnve)
- chore: refactor Substrait consumer's "rename_field" and implement the rest of types #16345 (Blizzara)
- chore(deps): bump object_store from 0.12.1 to 0.12.2 #16368 (dependabot[bot])
- Disable
datafusion-clitests for hash_collision tests, fix extended CI #16382 (alamb) - Fix array_concat with NULL arrays #16348 (alexanderbianchi)
- Minor: add testing case for add YieldStreamExec and polish docs #16369 (zhuqi-lucas)
- chore(deps): bump aws-config from 1.6.3 to 1.8.0 #16394 (dependabot[bot])
- fix typo in test file name #16403 (adriangb)
- Add topk_tpch benchmark #16410 (Dandandan)
- Reduce some cloning #16404 (simonvandel)
- chore(deps): bump syn from 2.0.102 to 2.0.103 #16393 (dependabot[bot])
- Simplify expressions passed to table functions #16388 (simonvandel)
- Minor: Clean-up
bench.shusage message #16416 (2010YOUY01) - chore(deps): bump rust_decimal from 1.37.1 to 1.37.2 #16422 (dependabot[bot])
- Migrate core test to insta, part1 #16324 (Chen-Yuan-Lai)
- chore(deps): bump mimalloc from 0.1.46 to 0.1.47 #16426 (dependabot[bot])
- chore(deps): bump libc from 0.2.172 to 0.2.173 #16421 (dependabot[bot])
- Use dedicated NullEquality enum instead of null_equals_null boolean #16419 (tobixdev)
- chore: generate basic spark function tests #16409 (shehabgamin)
- Fix CI Failure: replace false with NullEqualsNothing #16437 (ding-young)
- chore(deps): bump bzip2 from 0.5.2 to 0.6.0 #16441 (dependabot[bot])
- chore(deps): bump libc from 0.2.173 to 0.2.174 #16440 (dependabot[bot])
- Remove redundant license-header-check CI job #16451 (alamb)
- Remove unused feature in
physical-planand fix compilation error in benchmark #16449 (AdamGS) - Temporarily fix bug in dynamic top-k optimization #16465 (AdamGS)
- Ignore
sort_query_fuzzer_runner#16462 (blaginin) - Revert "Ignore
sort_query_fuzzer_runner(#16462)" #16470 (2010YOUY01) - Reapply "Ignore
sort_query_fuzzer_runner(#16462)" (#16470) #16485 (alamb) - Fix constant window for evaluate stateful #16430 (suibianwanwank)
- Use UDTF name in logical plan table scan #16468 (Jeadie)
- refactor reassign_predicate_columns to accept an &Schema instead of &Arc #16499 (adriangb)
- re-enable
sort_query_fuzzer_runner#16491 (adriangb) - Example for using a separate threadpool for CPU bound work (try 3) #16331 (alamb)
- chore(deps): bump syn from 2.0.103 to 2.0.104 #16507 (dependabot[bot])
- use 'lit' as the field name for literal values #16498 (adriangb)
- [datafusion-spark] Implement
factoricalfunction #16125 (tlm365) - Add DESC alias for DESCRIBE command. #16514 (lucqui)
- Split clickbench query set into one file per query #16476 (pepijnve)
- Support query filter on all benchmarks #16477 (pepijnve)
TableProviderto skip files in the folder which non relevant to selected reader #16487 (comphead)- Reuse
BaselineMetricsinUnnestMetrics#16497 (hendrikmakait) - Fix array_has to return false for empty arrays instead of null #16529 (kosiew)
- Minor: Add documentation to
AggregateWindowExpr::get_result_column#16479 (alamb) - Fix WindowFrame::new with order_by #16537 (findepi)
- chore(deps): bump object_store from 0.12.1 to 0.12.2 #16548 (dependabot[bot])
- chore(deps): bump mimalloc from 0.1.46 to 0.1.47 #16547 (dependabot[bot])
- Add support for Arrow Duration type in Substrait #16503 (jkosh44)
- Allow unparser to override the alias name for the specific dialect #16540 (goldmedal)
- Avoid clones when calling find_window_exprs #16551 (findepi)
- Update
spilled_bytesmetric to reflect actual disk usage #16535 (ding-young) - adapt filter expressions to file schema during parquet scan #16461 (adriangb)
- datafusion-cli: Use correct S3 region if it is not specified #16502 (liamzwbao)
- Add nested struct casting support and integrate into SchemaAdapter #16371 (kosiew)
- Improve err message grammar #16566 (findepi)
- refactor: move PruningPredicate into its own module #16587 (adriangb)
- chore(deps): bump indexmap from 2.9.0 to 2.10.0 #16582 (dependabot[bot])
- Skip re-pruning based on partition values and file level stats if there are no dynamic filters #16424 (adriangb)
- Support timestamp and date arguments for
rangeandgenerate_seriestable functions #16552 (simonvandel) - Fix normalization of columns in JOIN ... USING. #16560 (brunal)
- Revert Finalize support for
RightMarkjoin +Markjoin #16597 (comphead) - move min_batch/max_batch to functions-aggregate-common #16593 (adriangb)
- Allow usage of table functions in relations #16571 (osipovartem)
- Update to arrow/parquet 55.2.0 #16575 (alamb)
- Improve field naming in first_value, last_value implementation #16631 (findepi)
- Fix spurious failure in convert_batches test helper #16627 (findepi)
- Aggregate UDF cleanup #16628 (findepi)
- Avoid treating incomparable scalars as equal #16624 (findepi)
- restore topk pre-filtering of batches and make sort query fuzzer less sensitive to expected non determinism #16501 (alamb)
- Add support for Arrow Time types in Substrait #16558 (jkosh44)
- chore(deps): bump substrait from 0.57.0 to 0.58.0 #16640 (dependabot[bot])
- Support explain tree format debug for benchmark debug #16604 (zhuqi-lucas)
- Add microbenchmark for spilling with compression #16512 (ding-young)
- Fix parquet filter_pushdown: respect parquet filter pushdown config in scan #16646 (adriangb)
- chore(deps): bump aws-config from 1.8.0 to 1.8.1 #16651 (dependabot[bot])
- Migrate core test to insta, part 2 #16617 (Chen-Yuan-Lai)
- Update all spark SLT files #16637 (findepi)
- Add PhysicalExpr optimizer and cast unwrapping #16530 (adriangb)
- benchmark: Support sort_tpch10 for benchmark #16671 (zhuqi-lucas)
- chore(deps): bump tokio from 1.45.1 to 1.46.0 #16666 (dependabot[bot])
- Fix TopK Sort incorrectly pushed down past Join with anti join #16641 (zhuqi-lucas)
- Improve error message when ScalarValue fails to cast array #16670 (findepi)
- Add an example of embedding indexes inside a parquet file #16395 (zhuqi-lucas)
datafusion-cli: Refactor statement execution logic #16634 (liamzwbao)- Add SchemaAdapterFactory Support for ListingTable with Schema Evolution and Mapping #16583 (kosiew)
- Perf: fast CursorValues compare for StringViewArray using inlinekey… #16630 (zhuqi-lucas)
- Update to Rust 1.88 #16663 (melroy12)
- Refactor StreamJoinMetrics to reuse BaselineMetrics #16674 (Standing-Man)
- chore: refactor
BuildProbeJoinMetricsto useBaselineMetrics#16500 (Samyak2) - Use compression type in CSV file suffices #16609 (theirix)
- Clarify the generality of the embedded parquet index #16692 (alamb)
- Refactor SortMergeJoinMetrics to reuse BaselineMetrics #16675 (Standing-Man)
- Add support for Arrow Dictionary type in Substrait #16608 (jkosh44)
- Fix duplicate field name error in Join::try_new_with_project_input during physical planning #16454 (LiaCastaneda)
- chore(deps): bump tokio from 1.46.0 to 1.46.1 #16700 (dependabot[bot])
- Add reproducer for tpch Q16 deserialization bug #16662 (NGA-TRAN)
- Minor: Update release instructions #16701 (alamb)
- refactor filter pushdown APIs #16642 (adriangb)
- Add comments to ClickBench queries about setting binary_as_string #16605 (alamb)
- minor: improve display output for FFI execution plans #16713 (timsaucer)
- Revert "fix: create file for empty stream" #16682 (brunal)
- Add the missing equivalence info for filter pushdown #16686 (liamzwbao)
- Fix sqllogictests test running compatibility (ignore
--test-threads) #16694 (mjgarton) - Fix: Make
CopyTological plan output schema consistent with physical schema #16705 (bert-beyondloops) - chore(devcontainer): use debian's
protobuf-compilerpackage #16687 (fvj) - Add link to upgrade guide in changelog script #16680 (alamb)
- Improve display format of BoundedWindowAggExec #16645 (geetanshjuneja)
- Fix: optimize projections for unnest logical plan. #16632 (bert-beyondloops)
- Use the
test-threadsoption in sqllogictests #16722 (mjgarton) - chore(deps): bump clap from 4.5.40 to 4.5.41 #16735 (dependabot[bot])
- chore: make more clarity for internal errors #16741 (comphead)
- Remove parquet_filter and parquet
sortbenchmarks #16730 (alamb) - Perform type coercion for corr aggregate function #15776 (kumarlokesh)
- Improve dictionary null handling in hashing and expand aggregate test coverage for nulls #16466 (kosiew)
- Improve Ci cache #16709 (blaginin)
- Fix in list round trip in df proto #16744 (XiangpengHao)
- chore: Make
GroupValuesand APIs onPhysicalGroupByaggregation APIs public #16733 (haohuaijin) - Extend binary coercion rules to support Decimal arithmetic operations with integer(signed and unsigned) types #16668 (jatin510)
- Support Type Coercion for NULL in Binary Arithmetic Expressions #16761 (kosiew)
- chore(deps): bump chrono-tz from 0.10.3 to 0.10.4 #16769 (dependabot[bot])
- limit intermediate batch size in nested_loop_join #16443 (UBarney)
- Add serialization/deserialization and round-trip tests for all tpc-h queries #16742 (NGA-TRAN)
- Auto start testcontainers for
datafusion-cli#16644 (blaginin) - Refactor BinaryTypeCoercer to Handle Null Coercion Early and Avoid Redundant Checks #16768 (kosiew)
- Remove fixed version from MSRV check #16786 (findepi)
- Add
clickbench_pushdownbenchmark #16731 (alamb) - add filter to handle backtrace #16752 (geetanshjuneja)
- Support min/max aggregates for FixedSizeBinary type #16765 (theirix)
- fix tests in page_pruning when filter pushdown is enabled by default #16794 (XiangpengHao)
- Automatically split large single RecordBatches in
MemorySourceinto smaller batches #16734 (kosiew) - CI: Fix slow join test #16796 (2010YOUY01)
- Benchmark for char expression #16743 (ajita-asthana)
- Add example of custom file schema casting rules #16803 (adriangb)
- Fix discrepancy in Float64 to timestamp(9) casts for constants #16639 (findepi)
- Fix: Preserve sorting for the COPY TO plan #16785 (bert-beyondloops)
- chore(deps): bump object_store from 0.12.2 to 0.12.3 #16807 (dependabot[bot])
- Implement equals for stateful functions #16781 (findepi)
- benchmark: Add parquet h2o support #16804 (zhuqi-lucas)
- chore: use
equals_datatypeforBinaryExpr#16813 (comphead) - chore: add tests for out of bounds for NullArray #16802 (comphead)
- Refactor binary.rs tests into modular submodules under
binary/tests#16782 (kosiew) - cache generation of dictionary keys and null arrays for ScalarValue #16789 (adriangb)
- refactor(examples): remove redundant call to create directory in
parquet_embedded_index.rs#16825 (jcsherin) - Add benchmark for ByteViewGroupValueBuilder #16826 (zhuqi-lucas)
- Simplify try cast expr evaluation #16834 (lewiszlw)
- Fix flaky test case in joins.slt #16849 (findepi)
- chore(deps): bump sysinfo from 0.35.2 to 0.36.1 #16850 (dependabot[bot])
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
33 Andrew Lamb
26 dependabot[bot]
19 Adrian Garcia Badaracco
14 kosiew
13 Piotr Findeisen
13 Qi Zhu
7 Jonathan Chen
6 Chen Chongchen
6 Marco Neumann
6 Oleks V
6 Pepijn Van Eeckhoudt
6 xudong.w
5 Yongting You
5 ding-young
4 Simon Vandel Sillesen
3 Adam Gutglick
3 Bert Vermeiren
3 Dmitrii Blaginin
3 Joseph Koshakow
3 Liam Bao
3 Tim Saucer
2 Alan Tang
2 Arttu
2 Bruno
2 Corwin Joy
2 Daniël Heres
2 Geetansh Juneja
2 Ian Lai
2 Jax Liu
2 Martin Garton
2 Nga Tran
2 Ruihang Xia
2 Tai Le Manh
2 ViggoC
2 Xiangpeng Hao
2 haiywu
2 theirix
1 Ajeeta Asthana
1 Artem Osipov
1 Dharan Aditya
1 Gabriel
1 Geoffrey Claude
1 Hendrik Makait
1 Huaijin
1 Ian Wagner
1 Jack Eadie
1 Jagdish Parihar
1 Jon Mease
1 Julius von Froreich
1 K
1 Leon Lin
1 Loakesh Indiran
1 Lokesh
1 Lucas Earl
1 Lía Adriana
1 Mehmet Ozan Kabak
1 Melroy dsilva
1 Nirnay Roy
1 Nuno Faria
1 R. Tyler Croy
1 Rishab Joshi
1 Sami Tabet
1 Samyak Sarnayak
1 Shehab Amin
1 Tobias Schwarzinger
1 UBarney
1 alexanderbianchi
1 epgif
1 feniljain
1 m09526
1 suibianwanwan
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.