Update cranelift requirement from 0.88.0 to 0.89.0#123
Closed
dependabot[bot] wants to merge 1 commit intomasterfrom
Closed
Update cranelift requirement from 0.88.0 to 0.89.0#123dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot[bot] wants to merge 1 commit intomasterfrom
Conversation
Updates the requirements on [cranelift](https://github.com/bytecodealliance/wasmtime) to permit the latest version. - [Release notes](https://github.com/bytecodealliance/wasmtime/releases) - [Changelog](https://github.com/bytecodealliance/wasmtime/blob/main/docs/WASI-some-possible-changes.md) - [Commits](https://github.com/bytecodealliance/wasmtime/commits) --- updated-dependencies: - dependency-name: cranelift dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
Author
|
The following labels could not be found: |
Author
|
Looks like cranelift is up-to-date now, so this is no longer needed. |
comphead
added a commit
that referenced
this pull request
Oct 16, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. Related to apache#18084 ## Rationale for this change Run extended suite on PRs for critical areas, to avoid post merge bugfixing <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
added a commit
that referenced
this pull request
Oct 16, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. Followup on apache#18063 (review) ## Rationale for this change Use cheaper `NullBuffer::union` to apply null mask instead of iterator approach <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
added a commit
that referenced
this pull request
Oct 17, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. Followup on apache#18063 (review) ## Rationale for this change Use cheaper `NullBuffer::union` to apply null mask instead of iterator approach <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> (cherry picked from commit 337378a)
comphead
added a commit
that referenced
this pull request
Oct 17, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. Followup on apache#18063 (review) ## Rationale for this change Use cheaper `NullBuffer::union` to apply null mask instead of iterator approach <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> (cherry picked from commit 337378a)
comphead
pushed a commit
that referenced
this pull request
Oct 17, 2025
…unctions in proto (apache#18024) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#17417. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> - Support `null_treatment`, `distinct`, and `filter` for window function in proto. - Support `null_treatment` for aggregate udf in proto. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - [x] Add `null_treatment`, `distinct`, `filter` fields to `WindowExprNode` message and handle them in `to/from_proto.rs`. - [x] Add `null_treatment` field to `AggregateUDFExprNode` message and handle them in `to/from_proto.rs`. - [ ] Docs update: I'm not sure where to add docs as declared in the issue description. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> - Add tests to `roundtrip_window` for respectnulls, ignorenulls, distinct, filter. - Add tests to `roundtrip_aggregate_udf` for respectnulls, ignorenulls. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> N/A --------- Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
comphead
pushed a commit
that referenced
this pull request
Oct 17, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Doesn't close an issue. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Hi we are hiop, a Serverless Data Logistic Platform. We use DataFusion as a core part of our backend engine, and it plays a crucial role in our data infrastructure. Our team members are passionate about the project and actively try contribute to its development (@dariocurr). We’d love to have Hiop listed among the Known Users to show our support and help the DataFusion community continue to grow. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Just adding hiop as known user ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 17, 2025
…8117) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#3695 - Closes apache#3797 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Was looking at above issues and I don't believe we skip the failed rules for any tests anymore (default for the config is also `false`), apart from this cleanup, so filing this PR so we can close the issues. Seems we only do in this `window.slt` test after this fix: https://github.com/apache/datafusion/blob/621a24978a7a9c6d2b27973d1853dbc8776a56b5/datafusion/sqllogictest/test_files/window.slt#L2587-L2611 Which seems intentional. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Remove unnecessary `skip_failed_rules` config. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing tests. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No. <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 17, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `EXPLAIN ANALYZE` can be used for profiling and displays the results alongside the EXPLAIN plan. The issue is that it currently shows too many low-level details. It would provide a better user experience if only the most commonly used metrics were shown by default, with more detailed metrics available through specific configuration options. ### Example In `datafusion-cli`: ``` > CREATE EXTERNAL TABLE IF NOT EXISTS lineitem STORED AS parquet LOCATION '/Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem'; 0 row(s) fetched. Elapsed 0.000 seconds. explain analyze select * from lineitem where l_orderkey = 3000000; ``` The parquet reader includes a large number of low-level details: ``` metrics=[output_rows=19813, elapsed_compute=14ns, batches_split=0, bytes_scanned=2147308, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=18, num_predicate_creation_errors=0, page_index_rows_matched=19813, page_index_rows_pruned=729088, predicate_cache_inner_records=0, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=21.997µs, metadata_load_time=273.83µs, page_index_eval_time=29.915µs, row_pushdown_eval_time=42ns, statistics_eval_time=76.248µs, time_elapsed_opening=4.02146ms, time_elapsed_processing=24.787461ms, time_elapsed_scanning_total=24.17671ms, time_elapsed_scanning_until_data=23.103665ms] ``` I believe only a subset of it is commonly used, for example `output_rows`, `metadata_load_time`, and how many file/row-group/pages are pruned, and it would better to only display the most common ones by default. ### Existing `VERBOSE` keyword There is a existing verbose keyword in `EXPLAIN ANALYZE VERBOSE`, however it's turning on per-partition metrics instead of controlling detail level. I think it would be hard to mix this partition control and the detail level introduced in this PR, so they're separated: the following config will be used for detail level and the semantics of `EXPLAIN ANALYZE VERBOSE` keep unchanged. ### This PR: configurable explain analyze level 1. Introduced a new config option `datafusion.explain.analyze_level`. When set to `dev` (default value), all existing metrics will be shown. If set to `summary`, only `BaselineMetrics` will be displayed (i.e. `output_rows` and `elapsed_compute`). Note now we only include `BaselineMetrics` for simplicity, in the follow-up PRs we can figure out what's the commonly used metrics for each operator, and add them to `summary` analyze level, finally set the `summary` analyze level to default. 2. Add a `MetricType` field associated with `Metric` for detail level or potentially category in the future. For different configurations, a certain `MetricType` set will be shown accordingly. #### Demo ``` -- continuing the above example > set datafusion.explain.analyze_level = summary; 0 row(s) fetched. Elapsed 0.000 seconds. > explain analyze select * from lineitem where l_orderkey = 3000000; +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=25.339µs] | | | FilterExec: l_orderkey@0 = 3000000, metrics=[output_rows=5, elapsed_compute=81.221µs] | | | DataSourceExec: file_groups={14 groups: [[Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:0..11525426], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:11525426..20311205, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:0..2739647], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:2739647..14265073], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:14265073..20193593, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:0..5596906], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:5596906..17122332], ...]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], file_type=parquet, predicate=l_orderkey@0 = 3000000, pruning_predicate=l_orderkey_null_count@2 != row_count@3 AND l_orderkey_min@0 <= 3000000 AND 3000000 <= l_orderkey_max@1, required_guarantees=[l_orderkey in (3000000)], metrics=[output_rows=19813, elapsed_compute=14ns] | | | | +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.025 seconds. ``` Only `BaselineMetrics` are shown. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 4. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> UT ## Are there any user-facing changes? No <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Oct 17, 2025
…e#18091) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> N/A ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> There's a few functions in `datafusion/expr-common/src/type_coercion/aggregates.rs` that are unused elsewhere in the codebase, likely a remnant before the refactor to UDF, so removing them. Some are still used (`coerce_avg_type()` and `avg_return_type()`) so these are inlined into the Avg aggregate function (similar to Sum). Also refactor some window functions to use already available macros. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Remove some unused functions - Inline avg coerce & return type logic - Refactor Spark Avg a bit to remove unnecessary code - Refactor ntile & nth window functions to use available macros ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing tests. ## Are there any user-facing changes? Yes as these functions were publicly exported; however I'm not sure they were meant to be used by users anyway, given what they do. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 17, 2025
apache#18099) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Fixes comparison errors when using dictionary-encoded types with comparison functions like NULLIF. ## Rationale for this change When using dictionary-encoded columns (e.g., Dictionary(Int32, Utf8)) in comparison operations with literals or other types, DataFusion would throw an error stating the types are not comparable. This was particularly problematic for functions like NULLIF which rely on comparison coercion. The issue was that comparison_coercion_numeric didn't handle dictionary types, even though the general comparison_coercion function did have dictionary support. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? 1. Refactored dictionary comparison logic: Extracted common dictionary coercion logic into dictionary_comparison_coercion_generic to avoid code duplication. 2. Added numeric-specific dictionary coercion: Introduced dictionary_comparison_coercion_numeric that uses numeric-preferring comparison rules when dealing with dictionary value types. 3. Updated comparison_coercion_numeric: Added a call to dictionary_comparison_coercion_numeric in the coercion chain to properly handle dictionary types. 4. Added sqllogictest cases demonstrating the fix works for various dictionary comparison scenarios. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes, added tests in datafusion/sqllogictest/test_files/nullif.slt covering: - Dictionary type compared with string literal - String compared with dictionary type - Dictionary compared with dictionary All tests pass with the fix and would fail without it. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? This is a bug fix that enables previously failing queries to work correctly. No breaking changes or API modifications. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
alamb
added a commit
that referenced
this pull request
Oct 18, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18135 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
added a commit
that referenced
this pull request
Oct 24, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Oct 24, 2025
…pache#17478) (apache#18130) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Related to apache#17405 - Related to apache#18072 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> See apache#17478 ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> See apache#17478 ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Co-authored-by: Filippo Rossi <12383260+notfilippo@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Oct 24, 2025
…nics (apache#18013) (apache#18131) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Related to apache/datafusion-comet#2539 - Related to apache#18072 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Return errors rather than panicking. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Co-authored-by: Andy Grove <agrove@apache.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Oct 24, 2025
…amps (apache#17777) (apache#18129) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Related to apache#17776 - Related to apache#18072 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> If I've understood semantic equality correctly, any two timestamps should meet the bar for equality regardless of time units and timezones, but the current code doesn't reflect that. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Adds a branch to this method for timestamps. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> Yes <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Shiv Bhatia <shivbhatia10@gmail.com> Co-authored-by: Shiv Bhatia <sbhatia@palantir.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Oct 24, 2025
…e#18161) (apache#18179) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Related to apache#18070 - Part of apache#18072 ## Rationale for this change Fix performance regression in Datafusion 50 ## What changes are included in this PR? Backport apache#18161 to `branch-50` ## Are these changes tested? Yes ## Are there any user-facing changes? Fix performance regression Co-authored-by: Yongting You <2010youy01@gmail.com>
comphead
pushed a commit
that referenced
this pull request
Oct 27, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#16678. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> The issue has been fixed in apache#16639, this PR just adds a testcase for it. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Add a test case for `to_timestamp(double)` with vectorized input. Similar to the one presented in the issue. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> No
comphead
pushed a commit
that referenced
this pull request
Oct 27, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18070 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> See the above issue and its comment apache#18070 (comment) ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> In nested loop join, when the join column includes `List(Utf8View)`, use `take()` instead of `to_array_of_size()` to avoid deep copying the utf8 buffers inside `Utf8View` array. This is the quick fix, avoiding deep copy inside `to_array_of_size()` is a bit tricky. Here is `ListArray`'s physical layout: https://arrow.apache.org/rust/arrow/array/struct.GenericListArray.html If multiple elements is pointing to the same list range, the underlying payload can't be reused.So the potential fix in `to_array_of_size` can only avoids copying the inner-inner utf8view array buffers, but can't avoid copying the inner array (i.e. views are still copied), and deep copying for other primitive types also can't be avoided. Seems this can be better solved when `ListView` type is ready 🤔 ### Benchmark I tried query 1 in apache#18070, but only used 3 randomly sampled `places` parquet file. 49.0.0: 4s 50.0.0: stuck > 1 minute PR: 4s Now the performance are similar, I suspect the most time is spend evaluating the expensive `array_has` so the optimization in apache#16996 can't help much. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing tests ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Oct 27, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#17913 . ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> - Improve SQL code block rendering by upgrading `pydata-sphinx-theme` - fix sidebar layout ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 4. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> yes ## Are there any user-facing changes? documentation ui <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 27, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - part of apache#17427 ## Rationale for this change Adds regular joins (left, right, full, inner) for PWMJ as they behave differently in the code path. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? Adds classic join + physical planner <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes SLT tests + unit tests ## Follow up work to this pull request - Handling partitioned queries and multiple record batches (fuzz testing will be handled with this) - Simplify physical planning - Add more unit tests for different types (another pr as the LOC in this pr is getting a little daunting) next would be to implement the existence joins --------- Co-authored-by: Yongting You <2010youy01@gmail.com>
comphead
pushed a commit
that referenced
this pull request
Oct 27, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#17854 . ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 27, 2025
…pache#18017) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#17993 ## Rationale for this change ``` DataFusion CLI v50.1.0 > SET TIME ZONE = '+08:00'; 0 row(s) fetched. Elapsed 0.011 seconds. > SELECT arrow_typeof(now()); +---------------------------------------+ | arrow_typeof(now()) | +---------------------------------------+ | Timestamp(Nanosecond, Some("+08:00")) | +---------------------------------------+ 1 row(s) fetched. Elapsed 0.015 seconds. > SELECT count(1) result FROM (SELECT now() as n) a WHERE n > '2000-01-01'::date; +--------+ | result | +--------+ | 1 | +--------+ 1 row(s) fetched. Elapsed 0.029 seconds. ``` <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? When the timezone changes, re-register `now()` function <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Oct 27, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change According to three-valued logic we should return `null` and that's also what happens when the argument is not a constant as seen in the test. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? Updated `ArrayHas::simplify` to explicitly handle `null` <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Updated the `array_has` SQL test and added unit tests <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? Yes, a minor change in behaviour wrt `null` <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 27, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#16820 . ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 27, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Part of apache#16602 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Now we have to search in the code comment (or even implementation) to find the documentation of certain metrics, it would be better to open a page in the `user-guide` for metrics. The doc has to be manually updated, the metrics construction is scattered in the codebase, so it's hard to make it auto-generated. This PR only includes 2 common metrics, I plan to add more operator-specific metrics while working on apache#18116 ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 30, 2025
…he#18316) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Noticed this function when reviewing apache#18313. I think it’s a good opportunity to add more documentation. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 30, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18267. /cc @NGA-TRAN ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> The `information_schema.views` does not have display `WITH ORDER` for the definition of a table. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Added condition for writing `WITH ORDER` for CreateExternalTable. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Did not add tests for this functionality as not other display functionality has tests and seems like a separate PR would be appropriate if this is needed. This was tested manually with: In `datafusion-cli` ``` -- Not sorted CREATE EXTERNAL TABLE dimension_csv STORED AS CSV LOCATION '/path/to/the/attached/dimension_1.csv' OPTIONS ('format.has_header' 'true'); -- Sorted CREATE EXTERNAL TABLE dimension_csv_sorted STORED AS CSV WITH ORDER (env, service, host) LOCATION '/path/to/the/attached/dimension_1.csv' OPTIONS ('format.has_header' 'true'); ``` Then running: ``` select * from information_schema.views; ``` With link to data: [dimension_1.csv](https://github.com/user-attachments/files/23124138/dimension_1.csv) ## Are there any user-facing changes? Yes, improves the information_schema.views display to include `WITH ORDER` <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 30, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18323 . ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Adds more detailed metrics, so it is easier to identify which part of the aggregate streams are actually slow. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Added a metrics struct, and used it in the functions common to the aggregate streams. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, added some tests to verify the metrics are actually updated and can be retrieved. I've also ran the groupby benchmarks to ensure we don't create timers in a way that could impact performance, and it seems ok, all the changes are within what I'd expect as std variation on a local machine. ``` Comparing main and agg-metrics -------------------- Benchmark h2o.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ agg-metrics ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 1252.42 ms │ 1196.62 ms │ no change │ │ QQuery 2 │ 3976.62 ms │ 3392.89 ms │ +1.17x faster │ │ QQuery 3 │ 3448.29 ms │ 2918.47 ms │ +1.18x faster │ │ QQuery 4 │ 1909.15 ms │ 1632.98 ms │ +1.17x faster │ │ QQuery 5 │ 3056.36 ms │ 2831.82 ms │ +1.08x faster │ │ QQuery 6 │ 2663.13 ms │ 2594.64 ms │ no change │ │ QQuery 7 │ 2802.28 ms │ 2592.43 ms │ +1.08x faster │ │ QQuery 8 │ 4489.29 ms │ 4199.00 ms │ +1.07x faster │ │ QQuery 9 │ 7001.75 ms │ 6622.98 ms │ +1.06x faster │ │ QQuery 10 │ 4725.80 ms │ 4619.37 ms │ no change │ └──────────────┴────────────┴─────────────┴───────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Benchmark Summary ┃ ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ Total Time (main) │ 35325.09ms │ │ Total Time (agg-metrics) │ 32601.19ms │ │ Average Time (main) │ 3532.51ms │ │ Average Time (agg-metrics) │ 3260.12ms │ │ Queries Faster │ 7 │ │ Queries Slower │ 0 │ │ Queries with No Change │ 3 │ │ Queries with Failure │ 0 │ └────────────────────────────┴────────────┘ ``` ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Nothing that is direct to the user, additional metrics will now be available, but no breaking changes. --------- Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com> Co-authored-by: Eshed Schacham <ashdnazg@gmail.com>
comphead
pushed a commit
that referenced
this pull request
Oct 30, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18026 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> This makes it possible to support temporary views in datafusion-python without code duplication. Ref: apache/datafusion-python#1267 ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Add new public function `DataFrame::into_temporary_view` - Update `DataFrameTableProvider` with a new member that determines the `table_type` - Add a test ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, see added test `register_temporary_table` ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Yes, there is a new public function `DataFrame::into_temporary_view` --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Oct 30, 2025
…8321) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18299 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> See writeup in apache#18297 This PR is for the remaining metrics in `DataSourceExec` with parquet data source. ### Demo In datafusion-cli ``` CREATE EXTERNAL TABLE IF NOT EXISTS lineitem STORED AS parquet LOCATION '/Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem'; set datafusion.explain.analyze_level = summary; explain analyze select * from lineitem where l_orderkey = 3000000; +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=48.677µs, output_bytes=1092.0 B] | | | FilterExec: l_orderkey@0 = 3000000, metrics=[output_rows=5, elapsed_compute=1.65872ms, output_bytes=530.8 KB] | | | DataSourceExec: file_groups={14 groups: [[Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:0..11525426], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:11525426..20311205, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:0..2739647], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:2739647..14265073], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:14265073..20193593, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:0..5596906], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:5596906..17122332], ...]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], file_type=parquet, predicate=l_orderkey@0 = 3000000, pruning_predicate=l_orderkey_null_count@2 != row_count@3 AND l_orderkey_min@0 <= 3000000 AND 3000000 <= l_orderkey_max@1, required_guarantees=[l_orderkey in (3000000)], metrics=[output_rows=19813, elapsed_compute=14ns, output_bytes=5.7 MB, files_ranges_pruned_statistics=21 total → 3 matched, page_index_rows_pruned=748901 total → 19813 matched, row_groups_pruned_bloom_filter=1 total → 1 matched, row_groups_pruned_statistics=1 total → 1 matched, bytes_scanned=2147308, metadata_load_time=1.794289ms] | | | | +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.081 seconds. ``` ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Update `row_groups_pruned_statistics`, `row_groups_pruned_bloom_filter`, `page_index_rows_pruned` with the new `PruningMetrics` metric type. The functional changes in the pr are in `datafusion/datasource-parquet/src/*`, it's only a few of lines, most changes are fixing tests. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> UTs are updated for the new metrics ## Are there any user-facing changes? No <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Oct 30, 2025
…#18374) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Temporary file is created during test, see reproducer in `datafusion-cli` ```sh yongting@Yongtings-MacBook-Pro-2 ~/C/datafusion (main=)> cargo test --package datafusion --test core_integration --all-features -- dataframe::test_copy_schema --exact --nocapture Compiling datafusion v50.3.0 (/Users/yongting/Code/datafusion/datafusion/core) Finished `test` profile [unoptimized + debuginfo] target(s) in 2.50s Running tests/core_integration.rs (target/debug/deps/core_integration-dee3896b38f536b2) running 1 test test dataframe::test_copy_schema ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 801 filtered out; finished in 0.02s yongting@Yongtings-MacBook-Pro-2 ~/C/datafusion (main=)> git status On branch main Your branch is up to date with 'upstream/main'. Untracked files: (use "git add <file>..." to include in what will be committed) "datafusion/core/\"/" nothing added to commit but untracked files present (use "git add" to track) ``` This PR fixes this test, and make CI stricter for similar temporary file creations. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, I have run the CI without fix in my local repo, it's failing as expected: https://github.com/2010YOUY01/arrow-datafusion/actions/runs/18913128118/job/53989721867 After the fix, the CI should be able to pass. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 30, 2025
…ields (apache#18100) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#16997 - Part of apache#11725 - Supersedes apache#17085 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> When reviewing apache#17085 I was very confused by the fix suggested, and tried to understand why `AccumulatorArgs` didn't have easy access to `Field`s of its input expressions, as compared to scalar/window functions which do. Introducing this new field should make it easier for users to grab datatype, metadata, nullability of their input expressions for aggregate functions. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Add a slice of `FieldRef` to `AccumulatorArgs` so users don't need to compute the input expression fields themselves via using schema. This addresses apache#16997 as it was confusing to have only the schema available as there are valid (?) cases where the schema is empty (such as literal only input). This fix differs from apache#17085 in that it doesn't special case for when there is literal only input; it leaves the physical `schema` provided to `AccumulatorArgs` untouched but provides a more ergonomic (and less confusing) API for users to retrieve `Field`s of their input arguments. - I'm still not sure if the schema being empty for literal only inputs is correct or not, so this might be considered a side step. If we could remove `schema` entirely from `AccumulatorArgs` maybe we wouldn't need to worry about this, but see my comment for why that wasn't done in this PR ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing unit tests. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> Yes, new field to `AccumulatorArgs` which is publicly exposed (with all it's fields). <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 30, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18334. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> The array-based implementation of date_trunc can produce incorrect results for negative timestamps (i.e. dates before 1970-01-01). Check for any such incorrect values and compensate accordingly. Running the date_trunc benchmark suggests this fix introduces an ~9% performance cost. ``` date_trunc_minute_1000 time: [1.7424 µs 1.7495 µs 1.7583 µs] change: [+7.9289% +8.5950% +9.1955%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe ``` ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, an SLT is added based on the issue. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Oct 30, 2025
…e#18317) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#15881 - See my notes below ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Trying to move away from user defined signatures where possible; mainly to ensure consistency of error checking/messages. The original issue is because the function has to do this checking itself leading to inconsistency of error used (ideally shouldn't be internal). By uplifting away from a user defined signature we can make use of existing code meant to handle this checking and error messages for us. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Defined range/generate_series signature via coercible API instead of being user defined. Some accompanying changes are needed in the signature code to make this possible. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Added SLT tests and fixed any existing ones. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No (error messages do change though) <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> It's better to let pruning metrics in parquet displayed in an order that is the same as the actual pruning order: ``` metrics=...files_ranges_pruned_statistics=21 total → 3 matched, row_groups_pruned_statistics=1 total → 1 matched, row_groups_pruned_bloom_filter=1 total → 1 matched, page_index_rows_pruned=748901 total → 19813 matched... ``` Now it's ordered alphabetically. See apache#18321 (comment) for reproducing. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Update the sort key API in `MetricValue`, to let the parquet pruning metrics display in the expected order. ## Are these changes tested? UT <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
added a commit
that referenced
this pull request
Nov 4, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change Found when was testing apache#18356 ``` > select date_trunc('YY', now()); Execution error: Unsupported date_trunc granularity: yy ``` Which is confusing, I would like to get a list of supported values <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
…or UDAFs (apache#18397) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18280 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `AggregateUDFImpl::is_ordered_set_aggregate` is confusingly named as all it does currently is permit usage of `WITHIN GROUP` SQL syntax. I don't think it would have any functionality in the future beyond this. Also makes it easier if in future we decide to implement [hypothetical-set aggregate functions](https://www.postgresql.org/docs/9.4/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE) too, since we wouldn't need a `is_hypothetical_set_aggregate` variation either. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Rename `AggregateUDFImpl::is_ordered_set_aggregate` to `AggregateUDFImpl::supports_within_group_clause`. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Existing tests. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> Yes. Added section to upgrade guide. <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Part of apache#18217 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> In `FilterExec`, selectivity is calculated as `output_rows/input_rows`. This PR supports such metric. I think this metrics provides important application-level insights, and would be commonly used, so it is displayed in the `summary` verbose level. ### Demo in `datafusion-cli` ``` > set datafusion.explain.analyze_level = summary; 0 row(s) fetched. Elapsed 0.000 seconds. > explain analyze select * from generate_series(100) as t1(v1) where v1 <10; +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=10, elapsed_compute=1.763µs, output_bytes=64.0 KB] | | | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=10, elapsed_compute=25.833µs, output_bytes=64.0 KB] | | | FilterExec: value@0 < 10, metrics=[output_rows=10, elapsed_compute=34.888µs, output_bytes=128.0 B, selectivity=9.9% (10/101)] | | | RepartitionExec: partitioning=RoundRobinBatch(14), input_partitions=1, metrics=[] | | | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100, batch_size=8192], metrics=[output_rows=101, elapsed_compute=33.167µs, output_bytes=64.0 KB] | | | | +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.004 seconds. ``` ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> 1. Add a new `MetricValue` for ratio. 2. Tracking selectivity in `FilterExec` with `MetricValue::Ratio` ## Are these changes tested? UT <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 3. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
…apache#18430) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Remove a unnecessary vec allocation in SortMergeJoinStream initialization ## Rationale for this change Remove a unnecessary vec allocation in SortMergeJoinStream initialization ## What changes are included in this PR? Remove a unnecessary vec allocation in SortMergeJoinStream initialization ## Are these changes tested? Covered by existing ## Are there any user-facing changes? No
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> N/A ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> When reviewing apache#18424 I noticed some refactoring that could be applied to existing array reverse implementation. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> See my comments for the refactors & justifications. Existing tests. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No. <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#16272. ## Rationale for this change - Null with null type were throwing invalid casts in substrait round trips <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - Added a null variation const that allows NULL with NULL types to be casted properly. - Made this a UserDefined type from the substrait side of things <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? - Yes added unit test in producer/types.rs - Previously failing tests pass <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes - apache#18204 - apache#18081 - fixes apache#18219 as a side effect ## Rationale for this change Default timezone was previously zulu however with the recent change to support default tz in now(), current_date(), etc which used to have no default tz the choice was made to unset the system wide timezone. ## What changes are included in this PR? Code, tests, upgrading doc. ## Are these changes tested? Yes, with existing tests. ## Are there any user-facing changes? Yes. Any query that used to use the default timezone would return a timestamp with a timezone of 'Z' will now return a timestamp without a timezone. This can be changed back to the previous behaviour with the sql ```sql SET TIMEZONE = '+00:00'; ``` --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
…8363) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#17670 - Closes apache#18419 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? Added a `flatten()` `List(LargeList)` test to the `sqllogictest` Added support for array `flatten()` on `List(LargeList(_))` types <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? `sqllogictest` passes, but I still need to implement a test where offsets could not be downcasted from i64 to i32 <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? Users will be able to use `flatten` on `List(LargeList)` types <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
…ALYZE (apache#18455) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18410 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> This PR adds the `reduction_factor` metric to the `AggregateExec` mode=Partial case. e.g from the issue ``` create table t1(a int, b int); insert into t1 values (1,10), (1, 20), (2,10), (2,30); explain analyze select a, sum(b) from t1 group by a; +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[sum(t1.b)], metrics=[output_rows=2, elapsed_compute=7.856539ms, output_bytes=544.0 B] | | | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=2, elapsed_compute=192.334µs, output_bytes=96.0 KB] | | | RepartitionExec: partitioning=Hash([a@0], 10), input_partitions=10, metrics=[] | | | RepartitionExec: partitioning=RoundRobinBatch(10), input_partitions=1, metrics=[] | | | AggregateExec: mode=Partial, gby=[a@0 as a], aggr=[sum(t1.b)], metrics=[output_rows=2, elapsed_compute=2.581625ms, output_bytes=544.0 B, reduction_factor=50% (2/4)] | | | DataSourceExec: partitions=1, partition_sizes=[1], metrics=[] | | | | +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` Note: For AggregateExec cases where this doesn't apply, the reduction_factor metric won't be shown. Here's an example of the explain analyze from the modified test in `explain_analyze.rs`. ``` running query: EXPLAIN ANALYZE SELECT count(*) as cnt FROM (SELECT count(*), c1 FROM aggregate_test_100 WHERE c13 != 'C2GT5KVyOPZpgKVl110TyZO0NcJ434' GROUP BY c1 ORDER BY c1 ) AS a UNION ALL SELECT 1 as cnt UNION ALL SELECT lead(c1, 1) OVER () as cnt FROM (select 1 as c1) AS b LIMIT 3 Query Output: +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | CoalescePartitionsExec: fetch=3, metrics=[output_rows=3, elapsed_compute=6.084µs, output_bytes=25.0 B] | | | UnionExec, metrics=[output_rows=3, elapsed_compute=117.208µs, output_bytes=25.0 B] | | | ProjectionExec: expr=[count(Int64(1))@0 as cnt], metrics=[output_rows=1, elapsed_compute=1.333µs, output_bytes=8.0 B] | | | AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))], metrics=[output_rows=1, elapsed_compute=70.542µs, output_bytes=8.0 B] | | | CoalescePartitionsExec, metrics=[output_rows=3, elapsed_compute=4.958µs, output_bytes=24.0 B] | | | AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))], metrics=[output_rows=3, elapsed_compute=51.835µs, output_bytes=24.0 B] | | | ProjectionExec: expr=[], metrics=[output_rows=5, elapsed_compute=2.251µs, output_bytes=0.0 B] | | | AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[], metrics=[output_rows=5, elapsed_compute=76.666µs, output_bytes=48.0 KB, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, peak_mem_used=50544, aggregate_arguments_time=3ns, aggregation_time=3ns, emitting_time=5.875µs, time_calculating_group_ids=9.459µs] | | | CoalesceBatchesExec: target_batch_size=4096, metrics=[output_rows=5, elapsed_compute=11.249µs, output_bytes=192.0 KB] | | | RepartitionExec: partitioning=Hash([c1@0], 3), input_partitions=3, metrics=[spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, fetch_time=15.064041ms, repartition_time=149.418µs, send_time=8.672µs] | | | AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[], metrics=[output_rows=5, elapsed_compute=248.667µs, output_bytes=16.0 KB, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, skipped_aggregation_rows=0, peak_mem_used=52168, aggregate_arguments_time=3ns, aggregation_time=3ns, emitting_time=7.377µs, time_calculating_group_ids=128.46µs, reduction_factor=5.1% (5/99)] | | | CoalesceBatchesExec: target_batch_size=4096, metrics=[output_rows=99, elapsed_compute=81.459µs, output_bytes=64.0 KB] | | | FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434, projection=[c1@0], metrics=[output_rows=99, elapsed_compute=503.793µs, output_bytes=1584.0 B, selectivity=99% (99/100)] | | | RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=1, metrics=[spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, fetch_time=4.160958ms, repartition_time=1ns, send_time=16.085µs] | | | DataSourceExec: file_groups={1 group: [[Users/peter/Documents/open-source/datafusion/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c13], file_type=csv, has_header=true, metrics=[output_rows=100, elapsed_compute=1ns, output_bytes=19.1 KB, batches_split=0, file_open_errors=0, file_scan_errors=0, time_elapsed_opening=313.458µs, time_elapsed_processing=3.974624ms, time_elapsed_scanning_total=3.771208ms, time_elapsed_scanning_until_data=3.714625ms] | | | ProjectionExec: expr=[1 as cnt], metrics=[output_rows=1, elapsed_compute=20.792µs, output_bytes=8.0 B] | | | PlaceholderRowExec, metrics=[] | | | ProjectionExec: expr=[lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@1 as cnt], metrics=[output_rows=1, elapsed_compute=1.333µs, output_bytes=9.0 B] | | | BoundedWindowAggExec: wdw=[lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": nullable Int64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted], metrics=[output_rows=1, elapsed_compute=560µs, output_bytes=17.0 B] | | | ProjectionExec: expr=[1 as c1], metrics=[output_rows=1, elapsed_compute=2.459µs, output_bytes=8.0 B] | | | PlaceholderRowExec, metrics=[] | | | | +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` The following cases don't include `reduction_factor` metric - `AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))]` - `AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]` - `AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[]` While this case does: - `AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[]` -> `reduction_factor=5.1% (5/99)` ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> Yes, a new metric will be visible when running `EXPLAIN ANALYZE` <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Yongting You <2010youy01@gmail.com>
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18418 ## Rationale for this change Keep dependencies up to date. ## What changes are included in this PR? Benchmark updates. ## Are these changes tested? Yes, every single benchmark was run. ## Are there any user-facing changes? No.
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> There's no issue for this, just some simple text fixes. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> While "the the" *can* be grammatically correct, in these instances it was not. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> "the the" -> "the" ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Not at as such, no. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Yes, in both code docs and in the website.
comphead
pushed a commit
that referenced
this pull request
Nov 4, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18479. You can see an example run with the warning [here](https://github.com/apache/datafusion/actions/runs/19056329292/job/54427495514), just expand the `Check datafusion (no-default-features)` section. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Fixes a warning when building without the sql feature. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Makes a `use` declaration conditional based on the sql feature. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> I ran the same command and had no issues. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Nope!
comphead
pushed a commit
that referenced
this pull request
Nov 7, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - part of #apache#18142. ## Rationale for this change As discussed in apache#18289 this PR is for consolidating all the `flight` examples into a single example binary. Then we can make sure we are agreed on the pattern and then we can apply it to the remaining examples <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Sergey Zhukov <szhukov@aligntech.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
comphead
pushed a commit
that referenced
this pull request
Nov 7, 2025
…st (apache#18485) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - This PR came up as part of apache#17964. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> This PR is intended to fix return type mismatch of spark `array` when inner data type is `LargeList`, e.g. ``` query error SELECT array(arrow_cast(array(1), 'LargeList(Int64)')) ---- DataFusion error: Internal error: Function 'array' returned value of type 'LargeList(Field { name: "element", data_type: LargeList(Field { data_type: Int64, nullable: true }), nullable: true })' while the following type was promised at planning time and expected: 'List(Field { name: "element", data_type: LargeList(Field { data_type: Int64, nullable: true }), nullable: true })'. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues ``` ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Return `List` regardless of whether inner data type is `LargeList` or not. This aligns with the behavior of datafusion `make_array` function. - Remove `return_field_from_args` as `return_type` is already defined and is invoked internally. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> No. --------- Co-authored-by: Michael Kleen <mkleen@gmail.com> Co-authored-by: Sergey Zhukov <62326549+cj-zhukov@users.noreply.github.com> Co-authored-by: Sergey Zhukov <szhukov@aligntech.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: bubulalabu <bubulalabububu@gmail.com> Co-authored-by: Vegard Stikbakke <vegard.stikbakke@gmail.com>
comphead
pushed a commit
that referenced
this pull request
Nov 7, 2025
apache#18481) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18407 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> This new metric will give the user better visibility to see what portion of the possibilities is actually being matched. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Add selectivity metric to NestedLoopJoinExec for EXPLAIN ANALYZE ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Added tests ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> Yes, new metric in explain analyze <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Yongting You <2010youy01@gmail.com>
comphead
pushed a commit
that referenced
this pull request
Nov 7, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - part of #apache#18142. ## Rationale for this change This PR is for consolidating all the `udf` examples into a single example binary. We are agreed on the pattern and we can apply it to the remaining examples <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Sergey Zhukov <szhukov@aligntech.com>
comphead
pushed a commit
that referenced
this pull request
Nov 10, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> An initial attempt towards apache#18467 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ### Rationale for the additional lint rule `clippy::needless_pass_by_value` There is a clippy lint rule that is not turned on by the current strictness level in CI: https://rust-lang.github.io/rust-clippy/master/index.html#needless_pass_by_value Note it has the `Clippy` category `pedantic`, and its description is `lints which are rather strict or have occasional false positives` from https://doc.rust-lang.org/nightly/clippy It seems we have been suffering from the excessive copying issue for quite some time, and @alamb is on the front line now apache#18413. I think this extra lint rule is able to help. ### Implementation plan This PR only enables this rule in `datafusion-common` package, and apply `#[allow(clippy::needless_pass_by_value)]` for all violations. If this PR makes sense, we can open a tracking issue and roll out this check to the remaining workspace packages. At least this can help prevent new inefficient patterns and identify existing issues that we can fix gradually. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Nov 10, 2025
) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#18505. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Enforce clippy `needless_pass_by_value`. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> No
comphead
pushed a commit
that referenced
this pull request
Nov 10, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Follow on to apache#18468 ## Rationale for this change We missed the fact that you couldn't yet add new linter rules to subcrates via Cargo.toml overrides. Thankfully @Jefffrey sorted is out. Let's try and avoid that again by leaving a comment ## What changes are included in this PR? Add comments to help our future selves remember to add new lints to lib.rs rather than Cargo.toml for subcrates ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
pushed a commit
that referenced
this pull request
Nov 10, 2025
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> In apache#18468, there is a inconsistent comment I forget to remove. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Updates the requirements on cranelift to permit the latest version.
Commits
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)