Skip to content

Fix handling of version files that contain CRLFs#1789

Merged
edmorley merged 1 commit intomainfrom
crlf-version-files
May 6, 2025
Merged

Fix handling of version files that contain CRLFs#1789
edmorley merged 1 commit intomainfrom
crlf-version-files

Conversation

@edmorley
Copy link
Copy Markdown
Member

@edmorley edmorley commented May 6, 2025

From looking at metrics, I noticed that the cat --show-nonprinting approach added in #1783 to aid the visualisation of invisible characters (such as ASCII control characters) also escapes carriage return characters (to ^M), which we don't want.

Instead, sed is now used which replaces anything not matching the :print: and :space: groups with the Unicode substitution character () - which means the classic buildpack now also matches the CNB's behaviour:
https://github.com/heroku/buildpacks-python/blob/f2fdd00edf0f63b298cf88c82377885c88666440/src/python_version_file.rs#L16-L19

For a mapping of what :print: and :space: cover, see the chart at the bottom of this page:
https://en.cppreference.com/w/cpp/string/byte/isprint

GUS-W-18225347.

@edmorley edmorley self-assigned this May 6, 2025
@edmorley edmorley force-pushed the crlf-version-files branch from 4278986 to 868bbed Compare May 6, 2025 20:06
From looking at metrics, I noticed that the `cat --show-nonprinting`
approach added in #1783 to aid the visualisation of invisible characters
(such as ASCII control characters) also escapes carriage return
characters (to `^M`), which we don't want.

Instead, `sed` is now used which replaces anything not matching the
`:print:` and `:space:` groups with the Unicode substitution character
(`�`) - which means the classic buildpack now also matches the CNB's
behaviour:
https://github.com/heroku/buildpacks-python/blob/f2fdd00edf0f63b298cf88c82377885c88666440/src/python_version_file.rs#L16-L19

For a mapping of what `:print:` and `:space:` cover, see the chart at
the bottom of this page:
https://en.cppreference.com/w/cpp/string/byte/isprint

GUS-W-18225347.
@edmorley edmorley force-pushed the crlf-version-files branch from 868bbed to 11a3d91 Compare May 6, 2025 20:18
@edmorley edmorley marked this pull request as ready for review May 6, 2025 20:20
@edmorley edmorley requested a review from a team as a code owner May 6, 2025 20:20
@edmorley edmorley added the bug label May 6, 2025
@edmorley edmorley merged commit c1042d2 into main May 6, 2025
7 checks passed
@edmorley edmorley deleted the crlf-version-files branch May 6, 2025 20:42
@heroku-linguist heroku-linguist bot mentioned this pull request May 6, 2025
edmorley added a commit that referenced this pull request May 8, 2025
Since otherwise any field that contains a carriage return character
could span multiple lines in the metrics data store, which then
wouldn't be read back in its entirety by `bin/report`, which would
lead to eg a truncated fields in Honeycomb.

Such characters are much less likely now after #1789, however,
they can still be present in the user-provided input (that ends up
in fields like `failure_detail`) in some cases, plus from a general
correctness point of view, the key-value store should be escaping
all forms of newline characters.

I've also changed the escaping strategy to use literal `\n` and `\r`
characters so it's possible to distinguish between multi-line and
single line (but space delimited) values more easily.

GUS-W-18471014.
edmorley added a commit that referenced this pull request May 8, 2025
Since otherwise any field that contains a carriage return character
could span multiple lines in the internal metrics data store, which then
wouldn't be read back in its entirety by `bin/report`, which would lead
to a truncated field in Honeycomb.

Such characters are much less likely now after #1789, however, they can
still be present in the user-provided input in some cases (that ends up
in metrics fields like `failure_detail`), plus from a general correctness
point of view, the key-value store's attribute saving functions should
be escaping all forms of newline characters.

I've also changed the escaping strategy to use literal `\n` and `\r`
characters so it's possible to distinguish between multi-line and single
line (but space delimited) values more easily in Honeycomb.

GUS-W-18471014.
edmorley added a commit that referenced this pull request May 8, 2025
Since otherwise any field that contains a carriage return character
could span multiple lines in the internal metrics data store, which then
wouldn't be read back in its entirety by `bin/report`, which would lead
to a truncated field in Honeycomb.

Such characters are much less likely now after #1789, however, they can
still be present in the user-provided input in some cases (that ends up
in metrics fields like `failure_detail`), plus from a general correctness
point of view, the key-value store's attribute saving functions should
be escaping all forms of newline characters.

I've also changed the escaping strategy to use literal `\n` and `\r`
characters so it's possible to distinguish between multi-line and single
line (but space delimited) values more easily in Honeycomb.

GUS-W-18471014.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants