Skip to content

Commit 030c530

Browse files
committed
Merge branch 'main' into iceberg-rust
2 parents 224b3e7 + d47f196 commit 030c530

62 files changed

Lines changed: 1258 additions & 954 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/actions/java-test/action.yaml

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,32 @@ runs:
6262
- name: Run Maven compile
6363
shell: bash
6464
run: |
65-
./mvnw -B compile test-compile scalafix:scalafix -Dscalafix.mode=CHECK -Psemanticdb ${{ inputs.maven_opts }}
65+
./mvnw -B package -DskipTests scalafix:scalafix -Dscalafix.mode=CHECK -Psemanticdb ${{ inputs.maven_opts }}
66+
67+
- name: Setup Node.js
68+
uses: actions/setup-node@v6
69+
with:
70+
node-version: '24'
71+
72+
- name: Install prettier
73+
shell: bash
74+
run: |
75+
npm install -g prettier
76+
77+
- name: Run prettier
78+
shell: bash
79+
run: |
80+
npx prettier "**/*.md" --write
81+
82+
- name: Mark workspace as safe for git
83+
shell: bash
84+
run: |
85+
git config --global --add safe.directory "$GITHUB_WORKSPACE"
86+
87+
- name: Check for any local git changes (such as generated docs)
88+
shell: bash
89+
run: |
90+
./dev/ci/check-working-tree-clean.sh
6691
6792
- name: Run all tests
6893
shell: bash

.github/pull_request_template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Closes #.
1010

1111
<!--
1212
Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
13-
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
13+
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
1414
-->
1515

1616
## What changes are included in this PR?
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
name: Check Markdown Formatting
19+
20+
concurrency:
21+
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
22+
cancel-in-progress: true
23+
24+
on:
25+
pull_request:
26+
paths:
27+
- '**.md'
28+
29+
jobs:
30+
prettier-check:
31+
runs-on: ubuntu-latest
32+
steps:
33+
- uses: actions/checkout@v5
34+
35+
- name: Setup Node.js
36+
uses: actions/setup-node@v6
37+
with:
38+
node-version: '24'
39+
40+
- name: Install prettier
41+
run: npm install -g prettier
42+
43+
- name: Check markdown formatting
44+
run: |
45+
# if you encounter error, run prettier locally and commit changes using instructions at:
46+
#
47+
# https://datafusion.apache.org/comet/contributor-guide/development.html#how-to-format-md-document
48+
#
49+
prettier --check "**/*.md"

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,4 @@ under the License.
1919

2020
# Apache DataFusion Comet Changelog
2121

22-
Comprehensive changelogs for each release are available [here](dev/changelog).
22+
Comprehensive changelogs for each release are available [here](dev/changelog).

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Apache DataFusion Comet is a high-performance accelerator for Apache Spark, buil
3434
performance of Apache Spark workloads while leveraging commodity hardware and seamlessly integrating with the
3535
Spark ecosystem without requiring any code changes.
3636

37-
Comet also accelerates Apache Iceberg, when performing Parquet scans from Spark.
37+
Comet also accelerates Apache Iceberg, when performing Parquet scans from Spark.
3838

3939
[Apache DataFusion]: https://datafusion.apache.org
4040

@@ -44,7 +44,7 @@ Comet also accelerates Apache Iceberg, when performing Parquet scans from Spark.
4444

4545
Comet delivers a performance speedup for many queries, enabling faster data processing and shorter time-to-insights.
4646

47-
The following chart shows the time it takes to run the 22 TPC-H queries against 100 GB of data in Parquet format
47+
The following chart shows the time it takes to run the 22 TPC-H queries against 100 GB of data in Parquet format
4848
using a single executor with 8 cores. See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html)
4949
for details of the environment used for these benchmarks.
5050

@@ -66,16 +66,16 @@ The following charts shows how much Comet currently accelerates each query from
6666

6767
![](docs/source/_static/images/benchmark-results/0.11.0/tpch_queries_speedup_abs.png)
6868

69-
These benchmarks can be reproduced in any environment using the documentation in the
70-
[Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html). We encourage
69+
These benchmarks can be reproduced in any environment using the documentation in the
70+
[Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html). We encourage
7171
you to run your own benchmarks.
7272

7373
Results for our benchmark derived from TPC-DS are available in the [benchmarking guide](https://datafusion.apache.org/comet/contributor-guide/benchmark-results/tpc-ds.html).
7474

7575
## Use Commodity Hardware
7676

7777
Comet leverages commodity hardware, eliminating the need for costly hardware upgrades or
78-
specialized hardware accelerators, such as GPUs or FPGA. By maximizing the utilization of commodity hardware, Comet
78+
specialized hardware accelerators, such as GPUs or FPGA. By maximizing the utilization of commodity hardware, Comet
7979
ensures cost-effectiveness and scalability for your Spark deployments.
8080

8181
## Spark Compatibility
@@ -102,7 +102,7 @@ To get started with Apache DataFusion Comet, follow the
102102
[DataFusion Slack and Discord channels](https://datafusion.apache.org/contributor-guide/communication.html) to connect
103103
with other users, ask questions, and share your experiences with Comet.
104104

105-
Follow [Apache DataFusion Comet Overview](https://datafusion.apache.org/comet/user-guide/overview.html) to get more detailed information
105+
Follow [Apache DataFusion Comet Overview](https://datafusion.apache.org/comet/user-guide/overview.html) to get more detailed information
106106

107107
## Contributing
108108

benchmarks/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,4 +102,3 @@ $SPARK_HOME/bin/spark-submit \
102102
--queries /opt/datafusion-benchmarks/tpcds/queries-spark \
103103
--iterations 1
104104
```
105-

common/src/main/scala/org/apache/comet/CometConf.scala

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -676,11 +676,12 @@ object CometConf extends ShimCometConf {
676676
.booleanConf
677677
.createWithDefault(false)
678678

679-
val COMET_EXPR_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
680-
conf("spark.comet.expression.allowIncompatible")
679+
val COMET_EXEC_STRICT_FLOATING_POINT: ConfigEntry[Boolean] =
680+
conf("spark.comet.exec.strictFloatingPoint")
681681
.category(CATEGORY_EXEC)
682-
.doc("Comet is not currently fully compatible with Spark for all expressions. " +
683-
s"Set this config to true to allow them anyway. $COMPAT_GUIDE.")
682+
.doc(
683+
"When enabled, fall back to Spark for floating-point operations that may differ from " +
684+
s"Spark, such as when comparing or sorting -0.0 and 0.0. $COMPAT_GUIDE.")
684685
.booleanConf
685686
.createWithDefault(false)
686687

dev/benchmarks/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ under the License.
1919

2020
# Comet Benchmarking Scripts
2121

22-
This directory contains scripts used for generating benchmark results that are published in this repository and in
22+
This directory contains scripts used for generating benchmark results that are published in this repository and in
2323
the Comet documentation.
2424

2525
For full instructions on running these benchmarks on an EC2 instance, see the [Comet Benchmarking on EC2 Guide].

dev/benchmarks/comet-tpcds.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ $SPARK_HOME/bin/spark-submit \
4040
--conf spark.executor.extraClassPath=$COMET_JAR \
4141
--conf spark.plugins=org.apache.spark.CometPlugin \
4242
--conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
43-
--conf spark.comet.expression.allowIncompatible=true \
43+
--conf spark.comet.expression.Cast.allowIncompatible=true \
4444
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
4545
--conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain \
4646
tpcbench.py \

dev/benchmarks/comet-tpch.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ $SPARK_HOME/bin/spark-submit \
4141
--conf spark.plugins=org.apache.spark.CometPlugin \
4242
--conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
4343
--conf spark.comet.exec.replaceSortMergeJoin=true \
44-
--conf spark.comet.expression.allowIncompatible=true \
44+
--conf spark.comet.expression.Cast.allowIncompatible=true \
4545
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
4646
--conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain \
4747
tpcbench.py \

0 commit comments

Comments
 (0)