Skip to content

[ruff] Stabilize unicode-to-unicode check for ambiguous-unicode-xxx (RUF001, RUF002, RUF003)#14403

Closed
dylwil3 wants to merge 3 commits intoastral-sh:ruff-0.8from
dylwil3:stabilize-ruf00x
Closed

[ruff] Stabilize unicode-to-unicode check for ambiguous-unicode-xxx (RUF001, RUF002, RUF003)#14403
dylwil3 wants to merge 3 commits intoastral-sh:ruff-0.8from
dylwil3:stabilize-ruf00x

Conversation

@dylwil3
Copy link
Copy Markdown
Collaborator

@dylwil3 dylwil3 commented Nov 17, 2024

This PR stabilizes the preview behavior introduced in #8473, where ambiguous unicode checks are flagged not only for unicode characters that may be confused for ASCII characters, but also for unicode characters that may be confused for other unicode characters.

Documentation has been updated to reflect this change.

Note: While there are open issues related to these rules, none of them are affected by whether the behavior under review is in preview or not.

@github-actions
Copy link
Copy Markdown
Contributor

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+9 -0 violations, +0 -0 fixes in 3 projects; 1 project error; 50 projects unchanged)

bokeh/bokeh (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --no-preview --select ALL

+ examples/interaction/tools/subcoordinates_zoom.py:22:23: RUF001 String contains ambiguous `µ` (MICRO SIGN). Did you mean `μ` (GREEK SMALL LETTER MU)?

pandas-dev/pandas (+6 -0 violations, +0 -0 fixes)

+ pandas/_libs/tslibs/timedeltas.pyi:54:6: RUF001 String contains ambiguous `µ` (MICRO SIGN). Did you mean `μ` (GREEK SMALL LETTER MU)?
+ pandas/core/indexes/base.py:5223:26: RUF003 Comment contains ambiguous `µ` (MICRO SIGN). Did you mean `μ` (GREEK SMALL LETTER MU)?
+ pandas/core/indexes/base.py:5224:46: RUF003 Comment contains ambiguous `µ` (MICRO SIGN). Did you mean `μ` (GREEK SMALL LETTER MU)?
+ pandas/core/indexes/base.py:5225:69: RUF003 Comment contains ambiguous `µ` (MICRO SIGN). Did you mean `μ` (GREEK SMALL LETTER MU)?
+ pandas/tests/io/test_clipboard.py:57:34: RUF001 String contains ambiguous `µ` (MICRO SIGN). Did you mean `μ` (GREEK SMALL LETTER MU)?
+ pandas/tests/scalar/timedelta/test_constructors.py:304:25: RUF001 String contains ambiguous `µ` (MICRO SIGN). Did you mean `μ` (GREEK SMALL LETTER MU)?

zulip/zulip (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --no-preview --select ALL

+ zerver/tests/test_invite.py:1381:32: RUF001 String contains ambiguous `µ` (MICRO SIGN). Did you mean `μ` (GREEK SMALL LETTER MU)?
+ zerver/tests/test_signup.py:1140:29: RUF001 String contains ambiguous `µ` (MICRO SIGN). Did you mean `μ` (GREEK SMALL LETTER MU)?

pypa/setuptools (error)

ruff failed
  Cause: Failed to parse /home/runner/work/ruff/ruff/checkouts/pypa:setuptools/ruff.toml
  Cause: TOML parse error at line 8, column 1
  |
8 | [lint]
  | ^^^^^^
Unknown rule selector: `UP027`

Changes by rule (2 rules affected)

code total + violation - violation + fix - fix
RUF001 6 6 0 0 0
RUF003 3 3 0 0 0

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+8300 -8323 violations, +0 -66 fixes in 8 projects; 1 project error; 45 projects unchanged)

apache/airflow (+6151 -6154 violations, +0 -58 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- airflow/api/__init__.py:32:5: DOC201 `return` is not documented in docstring
- airflow/api/__init__.py:32:5: DOC501 Raised exception `AirflowException` missing from docstring
+ airflow/api/__init__.py:43:15: DOC501 Raised exception `AirflowException` missing from docstring
+ airflow/api/__init__.py:44:5: DOC201 `return` is not documented in docstring
- airflow/api/auth/backend/deny_all.py:38:5: DOC201 `return` is not documented in docstring
+ airflow/api/auth/backend/deny_all.py:44:5: DOC201 `return` is not documented in docstring
- airflow/api/client/__init__.py:27:5: DOC201 `return` is not documented in docstring
+ airflow/api/client/__init__.py:38:5: DOC201 `return` is not documented in docstring
- airflow/api/common/airflow_health.py:30:5: DOC201 `return` is not documented in docstring
... 9012 additional changes omitted for rule DOC201
- airflow/api/common/delete_dag.py:43:5: DOC501 Raised exception `AirflowException` missing from docstring
- airflow/api/common/delete_dag.py:43:5: DOC501 Raised exception `DagNotFound` missing from docstring
+ airflow/api/common/delete_dag.py:61:15: DOC501 Raised exception `AirflowException` missing from docstring
+ airflow/api/common/delete_dag.py:64:15: DOC501 Raised exception `DagNotFound` missing from docstring
+ airflow/api/common/mark_tasks.py:125:15: DOC501 Raised exception `ValueError` missing from docstring
... 2944 additional changes omitted for rule DOC501
- airflow/api/common/mark_tasks.py:186:5: DOC402 `yield` is not documented in docstring
+ airflow/api/common/mark_tasks.py:190:13: DOC402 `yield` is not documented in docstring
- airflow/api_fastapi/common/db/common.py:33:5: DOC402 `yield` is not documented in docstring
+ airflow/api_fastapi/common/db/common.py:47:9: DOC402 `yield` is not documented in docstring
- airflow/assets/__init__.py:238:9: DOC402 `yield` is not documented in docstring
+ airflow/assets/__init__.py:243:9: DOC402 `yield` is not documented in docstring
- airflow/assets/__init__.py:358:9: DOC402 `yield` is not documented in docstring
... 328 additional changes omitted for rule DOC402
+ airflow/decorators/__init__.pyi:117:25: PYI041 Use `float` instead of `int | float`
- airflow/decorators/__init__.pyi:117:25: PYI041 [*] Use `float` instead of `int | float`
+ airflow/decorators/__init__.pyi:256:25: PYI041 Use `float` instead of `int | float`
- airflow/decorators/__init__.pyi:256:25: PYI041 [*] Use `float` instead of `int | float`
+ airflow/jobs/job.py:308:39: PYI041 Use `float` instead of `int | float`
- airflow/jobs/job.py:308:39: PYI041 [*] Use `float` instead of `int | float`
+ airflow/metrics/base_stats_logger.py:39:15: PYI041 Use `float` instead of `int | float`
... 52 additional changes omitted for rule PYI041
- airflow/models/dag.py:1038:36: PYI061 `Literal[None, ...]` can be replaced with `Literal[...] | None`
- airflow/models/dagrun.py:1317:23: RUF038 `Literal[True, False]` can be replaced with `bool`
- airflow/models/dagrun.py:1439:23: RUF038 `Literal[True, False]` can be replaced with `bool`
... 12332 additional changes omitted for project

apache/superset (+1166 -1168 violations, +0 -8 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ RELEASING/changelog.py:104:9: DOC201 `return` is not documented in docstring
- RELEASING/changelog.py:107:9: DOC201 `return` is not documented in docstring
+ RELEASING/changelog.py:113:13: DOC201 `return` is not documented in docstring
- RELEASING/changelog.py:52:9: DOC201 `return` is not documented in docstring
+ RELEASING/changelog.py:54:13: DOC201 `return` is not documented in docstring
- RELEASING/changelog.py:87:9: DOC201 `return` is not documented in docstring
... 1824 additional changes omitted for rule DOC201
- scripts/benchmark_migration.py:43:5: DOC501 Raised exception `Exception` missing from docstring
+ scripts/benchmark_migration.py:51:11: DOC501 Raised exception `Exception` missing from docstring
- scripts/cancel_github_workflows.py:162:5: DOC501 Raised exception `ClickException` missing from docstring
+ scripts/cancel_github_workflows.py:164:15: DOC501 Raised exception `ClickException` missing from docstring
... 2332 additional changes omitted for project

bokeh/bokeh (+402 -402 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ examples/advanced/extensions/parallel_plot/parallel_plot.py:107:5: DOC201 `return` is not documented in docstring
- examples/advanced/extensions/parallel_plot/parallel_plot.py:15:5: DOC201 `return` is not documented in docstring
- examples/basic/data/server_sent_events_source.py:53:9: DOC402 `yield` is not documented in docstring
+ examples/basic/data/server_sent_events_source.py:60:13: DOC402 `yield` is not documented in docstring
- examples/interaction/js_callbacks/js_on_event.py:16:5: DOC201 `return` is not documented in docstring
+ examples/interaction/js_callbacks/js_on_event.py:21:5: DOC201 `return` is not documented in docstring
- examples/models/gauges.py:33:5: DOC201 `return` is not documented in docstring
+ examples/models/gauges.py:34:5: DOC201 `return` is not documented in docstring
... 395 additional changes omitted for rule DOC201
- src/bokeh/__init__.py:63:5: DOC202 Docstring should not have a returns section because the function doesn't return anything
+ src/bokeh/__init__.py:65:1: DOC202 Docstring should not have a returns section because the function doesn't return anything
... 794 additional changes omitted for project

latchbio/latch (+0 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- src/latch/types/metadata.py:500:45: PYI061 `Literal[None, ...]` can be replaced with `Literal[...] | None`

pandas-dev/pandas (+0 -4 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- pandas/core/groupby/groupby.py:4069:39: PYI061 `Literal[None, ...]` can be replaced with `Literal[...] | None`
- pandas/core/groupby/indexing.py:299:39: PYI061 `Literal[None, ...]` can be replaced with `Literal[...] | None`
- pandas/io/html.py:1027:28: PYI061 `Literal[None, ...]` can be replaced with `Literal[...] | None`
- pandas/io/html.py:223:32: PYI061 `Literal[None, ...]` can be replaced with `Literal[...] | None`

python/typeshed (+0 -4 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select E,F,FA,I,PYI,RUF,UP,W

- stdlib/ast.pyi:1480:16: RUF038 `Literal[True, False]` can be replaced with `bool`
- stdlib/ast.pyi:1481:35: RUF038 `Literal[True, False]` can be replaced with `bool`
- stdlib/ast.pyi:1484:45: RUF038 `Literal[True, False]` can be replaced with `bool`
- stubs/pyxdg/xdg/Menu.pyi:97:11: RUF038 `Literal[True, False, ...]` can be replaced with `Literal[...] | bool`

zulip/zulip (+581 -589 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- analytics/lib/fixtures.py:19:5: DOC201 `return` is not documented in docstring
- analytics/lib/fixtures.py:19:5: DOC501 Raised exception `AssertionError` missing from docstring
+ analytics/lib/fixtures.py:56:15: DOC501 Raised exception `AssertionError` missing from docstring
+ analytics/lib/fixtures.py:77:5: DOC201 `return` is not documented in docstring
+ confirmation/models.py:125:5: DOC201 `return` is not documented in docstring
- confirmation/models.py:279:5: DOC201 `return` is not documented in docstring
+ confirmation/models.py:283:5: DOC201 `return` is not documented in docstring
- confirmation/models.py:298:5: DOC201 `return` is not documented in docstring
... 857 additional changes omitted for rule DOC201
- confirmation/models.py:298:5: DOC501 Raised exception `InvalidError` missing from docstring
+ confirmation/models.py:304:15: DOC501 Raised exception `InvalidError` missing from docstring
... 1160 additional changes omitted for project

wntrblm/nox (+0 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- nox/command.py:33:16: RUF038 `Literal[True, False, ...]` can be replaced with `Literal[...] | bool`

pypa/setuptools (error)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

ruff failed
  Cause: Failed to parse /home/runner/work/ruff/ruff/checkouts/pypa:setuptools/ruff.toml
  Cause: TOML parse error at line 8, column 1
  |
8 | [lint]
  | ^^^^^^
Unknown rule selector: `UP027`

Changes by rule (10 rules affected)

code total + violation - violation + fix - fix
DOC201 12109 6054 6055 0 0
DOC501 3920 1960 1960 0 0
DOC402 408 204 204 0 0
DOC202 158 79 79 0 0
PYI041 66 0 0 0 66
PYI061 14 0 14 0 0
RUF038 7 0 7 0 0
DOC502 4 2 2 0 0
DOC403 2 1 1 0 0
RUF100 1 0 1 0 0

@dylwil3
Copy link
Copy Markdown
Collaborator Author

dylwil3 commented Nov 17, 2024

I'm not sure how I feel about these ecosystem changes (in stable - the preview changes seem to be messed up). Seems to me like the use of the "micro" mu vs the greek letter mu is pretty common and done accurately in the context of SI units (e.g. microseconds etc.) So I'm wondering if this behavior should not be stabilized. What do you think @AlexWaygood / @MichaReiser ?

@Skylion007
Copy link
Copy Markdown
Contributor

I'm not sure how I feel about these ecosystem changes (in stable - the preview changes seem to be messed up). Seems to me like the use of the "micro" mu vs the greek letter mu is pretty common and done accurately in the context of SI units (e.g. microseconds etc.) So I'm wondering if this behavior should not be stabilized. What do you think @AlexWaygood / @MichaReiser ?

Agreed, I feel like the micro use case would be way more common empirically than the greek letter.

@MichaReiser
Copy link
Copy Markdown
Member

I don't think we should stabilize this change. I'm not sure what's our source for the confusables but I suspect it's the unicode confusable specification and I understand that it's specific to identifiers.

Identifiers ("IDs") are strings used in application contexts to refer to specific entities of certain significance in the given application. In a given application, an identifier will map to at most one specific entity. Many applications have security requirements related to identifiers. A common example is URLs referring to pages or other resources on the Internet: when a user wishes to access a resource, it is important that the user can be certain what resource they are interacting with. For example, they need to know that they are interacting with a particular financial service and not some other entity that is spoofing the intended service for malicious purposes. This illustrates a general security concern for identifiers: potential ambiguity of strings. While a machine has no difficulty distinguishing between any two different character sequences, it could be very difficult for humans to recognize and distinguish identifiers if an application did not limit which Unicode characters could be in identifiers. The focus of this specification is mitigation of such issues related to the security of identifiers.

Deliberately restricting the characters that can be used in identifiers is an important security technique. The exclusion of characters from identifiers does not affect the general use of those characters for other purposes, such as for general text in documents. Unicode Standard Annex #31, "Unicode Identifier and Pattern Syntax" [UAX31] provides a recommended method of determining which strings should qualify as identifiers. The UAX #31 specification extends the common practice of defining identifiers in terms of letters and numbers to the Unicode repertoire.

I understand that applying this set to strings is not the recommended use case because the confusables are valid in text positions.

I also noticed that neither clippy nor rust flag the mu character, not even in identifier positions, which makes sense to me. Neither does VS Code. One reason for this could be that mu should only be flagged if the rest of the code is in greek, because it's then very likely that you wanted to use the greek mu over the mathematical mu.

Either way, I think we should revisit the classification and I suspect that this is related to #13330

@AlexWaygood
Copy link
Copy Markdown
Member

Let's leave this for Ruff 0.8 and open an issue capturing these concerns 👍

@dylwil3
Copy link
Copy Markdown
Collaborator Author

dylwil3 commented Nov 18, 2024

Opened issue #14433 to track this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants