Skip to content

[SPARK-56018][PYTHON] Use ruff as formatter#54840

Closed
gaogaotiantian wants to merge 6 commits intoapache:masterfrom
gaogaotiantian:use-ruff-as-formatter
Closed

[SPARK-56018][PYTHON] Use ruff as formatter#54840
gaogaotiantian wants to merge 6 commits intoapache:masterfrom
gaogaotiantian:use-ruff-as-formatter

Conversation

@gaogaotiantian
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Replace black with ruff format.

Why are the changes needed?

There are few reasons we should use ruff

  1. We already use ruff for linter, using it for format will reduce a dependency, which makes upgrade easier
  2. ruff is significantly faster than black which is helpful for our pre-commit hooks
  3. ruff is more customizable if we need
  4. Personally I think the taste of ruff is slightly better than black. For example:
    • ruff enforces blank spaces for import, class and function better
    • ruff will put the code back in a single line if it fits
    • ruff always uses double quote when it can

There are some other details that you'll realize if you take a look at the diff. I think overall ruff generates slightly better code than black (and ruff is probably a bit more strict than black).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI needs to pass because we removed the black dependency.

Was this patch authored or co-authored using generative AI tooling?

No.

"*python/pyspark/sql/streaming/proto/*",
"*venv*/*",
]
line-length = 100
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this changed from 88?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we use 100 length line "Code style guide" at https://spark.apache.org/contributing.html

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs). Thank you for head-ups in the dev@spark mailing list, @gaogaotiantian .

BTW, this should be enforced only for Apache Spark 4.2+.

Copy link
Copy Markdown
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

dongjoon-hyun pushed a commit that referenced this pull request Mar 21, 2026
### What changes were proposed in this pull request?

This is a follow-up of #54840.

Disables black check by default.

### Why are the changes needed?

The python formatter moved from `black` to `ruff` at #54840, but `lint-python` still runs the black check if no arguments are provided and there is `black` installed, causing many failures.

```sh
% ./dev/lint-python
starting python compilation test...
python compilation succeeded.

starting black test...
black checks failed:
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54928 from ueshin/issues/SPARK-56018/disable_black.

Authored-by: Takuya Ueshin <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
terana pushed a commit to terana/spark that referenced this pull request Mar 23, 2026
### What changes were proposed in this pull request?

Replace `black` with `ruff format`.

### Why are the changes needed?

There are few reasons we should use `ruff`

1. We already use `ruff` for linter, using it for `format` will reduce a dependency, which makes upgrade easier
2. `ruff` is significantly faster than `black` which is helpful for our pre-commit hooks
3. `ruff` is more customizable if we need
4. Personally I think the taste of `ruff` is slightly better than `black`. For example:
    * `ruff` enforces blank spaces for `import`, `class` and `function` better
    * `ruff` will put the code back in a single line if it fits
    * `ruff` always uses double quote when it can

There are some other details that you'll realize if you take a look at the diff. I think overall `ruff` generates slightly better code than `black` (and `ruff` is probably a bit more strict than `black`).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI needs to pass because we removed the black dependency.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#54840 from gaogaotiantian/use-ruff-as-formatter.

Authored-by: Tian Gao <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
terana pushed a commit to terana/spark that referenced this pull request Mar 23, 2026
### What changes were proposed in this pull request?

This is a follow-up of apache#54840.

Disables black check by default.

### Why are the changes needed?

The python formatter moved from `black` to `ruff` at apache#54840, but `lint-python` still runs the black check if no arguments are provided and there is `black` installed, causing many failures.

```sh
% ./dev/lint-python
starting python compilation test...
python compilation succeeded.

starting black test...
black checks failed:
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#54928 from ueshin/issues/SPARK-56018/disable_black.

Authored-by: Takuya Ueshin <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@gaogaotiantian gaogaotiantian deleted the use-ruff-as-formatter branch March 25, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants