Skip to content

Improved dbcs file api handling#592

Open
1000TurquoisePogs wants to merge 2 commits into
v3.x/stagingfrom
feature/v3/dbcs-fileapi
Open

Improved dbcs file api handling#592
1000TurquoisePogs wants to merge 2 commits into
v3.x/stagingfrom
feature/v3/dbcs-fileapi

Conversation

@1000TurquoisePogs

Copy link
Copy Markdown
Member

Proposed changes

Fix the default target encoding used when serving USS file content via respondWithUnixFile2() in httpserver.c. Previously, on z/OS the target was hardcoded to ISO-8859-1 (819) for all text-type files regardless of their tagged CCSID. This corrupted content from files tagged as UTF-8 (1208), UTF-16 (1200/1201/1202), or any EBCDIC MIX code page (930, 933, 935, 937, 939, 1364, 1388, 1390, 1399).

The fix introduces isMultiByteCCSID(int ccsid) in charsets.c/charsets.h and uses it at runtime to select the target:

  • Single-byte source (e.g. IBM-1047, ISO-8859-1) → target remains ISO-8859-1 (819), no behaviour change.
  • Multi-byte source (UTF-8, UTF-16, EBCDIC MIX) → target is now UTF-8 (1208).

The OS-based #ifdef TBD comment is replaced with this runtime selection on z/OS. Non-z/OS platforms (Linux, AIX, Windows) continue to use UTF-8 as before.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

PR Checklist

  • If the changes in this PR are meant for the next release / mainline, this PR targets the "staging" branch.
  • My code follows the style guidelines of this project (see: Contributing guideline)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • New and existing unit tests pass locally with my changes
  • Relevant update to CHANGELOG.md
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works, or describe a test method below

Testing

Manual test — UTF-8 tagged file:

  1. On z/OS, create a USS file containing UTF-8 encoded text (e.g. a file with Japanese or Chinese characters).
  2. Tag it: chtag -tc 1208 <file>
  3. GET /unixFileContents/<path> with no query parameters.
  4. Verify the response bytes are valid UTF-8 and the non-Latin-1 characters are present and correct.
  5. Before this fix, non-Latin-1 bytes would be corrupted or replaced.

Regression — IBM-1047 tagged file:

  1. Create or use an existing USS file tagged CCSID 1047.
  2. GET /unixFileContents/<path>.
  3. Verify the response is the same ISO-8859-1 content as before this change.

Regression — untagged file:

  1. Create or use an untagged USS file (chtag -b <file> to remove tag, or leave untagged).
  2. GET /unixFileContents/<path>.
  3. Verify the response is NATIVE_CODEPAGE (1047) → ISO-8859-1 converted output, same as before.

Unit test — isMultiByteCCSID():

Input Expected
1208 (UTF-8) TRUE
1200 (UTF-16) TRUE
1201 (UTF-16BE) TRUE
1202 (UTF-16LE) TRUE
930 (EBCDIC MIX Japanese) TRUE
933, 935, 937, 939 TRUE
1364, 1388, 1390, 1399 TRUE
1047 (IBM-1047) FALSE
819 (ISO-8859-1) FALSE
37 (IBM-037) FALSE
0 (untagged) FALSE
-1 / 65535 (binary) FALSE

Further comments

The original code contained an explicit TBD comment acknowledging that the OS-based selection "isn't really an OS dependency". This PR resolves that TBD by selecting the target encoding at runtime based on isMultiByteCCSID().

The isMultiByteCCSID() function is intentionally conservative: it covers the Unicode encodings and the EBCDIC MIX (SBCS+DBCS) code pages that are realistic as USS file tags. Pure DBCS-only pages (300, 834, 835, 837) are excluded because they require SO/SI byte handling and are unlikely to appear as file-level CCSID tags in practice. The set can be extended in a follow-up if needed.

Signed-off-by: 1000TurquoisePogs <sgrady@rocketsoftware.com>
Signed-off-by: 1000TurquoisePogs <sgrady@rocketsoftware.com>
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant