Numeric escape sequence with surrogate pairs (Turtle) by afs · Pull Request #323 · w3c/rdf-tests

afs · 2026-04-16T08:51:48Z

This PR is part of the discussion w3c/rdf-turtle#131.

The tests are for allowing a pair of numeric escapes \uHHHH to be a well-formed surrogate pair that is interpreted as the supplemental character codepoint represented by that surrogate pair.

The surrogates are not part of the lexical form in the RDF data model, the supplemental character represented by that surrogate pair is and the RDFgraph is the same as if written using \U.

There are positive syntax tests for valid surrogate pairs written with \uHHHH\uHHHH (high-low surrogate) and negative tests for a malformed surrogate pairs (low-high, low-low, high-high) and for lone surrogates; the latter is also in the RDF 1.1 Turtle test suite but the same coverage is repeated for completeness.

There are evaluation tests for a valid surrogate pair with the same graph output in two forms, a supplemental character as UTF-8 and also written using \U (they parse to the same graph).

This PR is marked draft because the WG has not yet agreed a resolution of the i18n issue.

kasei · 2026-04-16T17:57:26Z

My initial reaction is that adding support for surrogates is bad. I'm not clear on what use cases it serves, but it adds complexity.

I do think adding the negative tests (and more test coverage in this area, depending on the decisions in w3c/rdf-turtle#131) is a good idea, though.

afs · 2026-04-16T18:08:48Z

I'm not clear on what use cases it serves, but it adds complexity.

The i18n request is at:

w3c/rdf-turtle#131 (comment)

Feel free to ask for more background.

kasei · 2026-04-16T18:25:52Z

Yeah, I've been following the discussion in that issue. I just don't see any convincing use-cases. I'm not convinced by "some programming languages do this" because those seem like bad upstream choices (possibly influenced by implementation details like the use of UTF-16).

afs · 2026-04-16T20:00:43Z

If the WG wants to define the correct (and only) outcome, some systems have to change.

Putting in "don't output surrogates" would be a start.

Numeric escape sequence with surrogate pairs (Turtle)

be0696e

afs marked this pull request as draft April 16, 2026 16:49

afs mentioned this pull request Apr 26, 2026

Allowing \u escaped surrogate pairs w3c/rdf-turtle#138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numeric escape sequence with surrogate pairs (Turtle)#323

Numeric escape sequence with surrogate pairs (Turtle)#323
afs wants to merge 1 commit intomainfrom
surrogates

afs commented Apr 16, 2026

Uh oh!

kasei commented Apr 16, 2026

Uh oh!

afs commented Apr 16, 2026

Uh oh!

kasei commented Apr 16, 2026

Uh oh!

afs commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

afs commented Apr 16, 2026

Uh oh!

kasei commented Apr 16, 2026

Uh oh!

afs commented Apr 16, 2026

Uh oh!

kasei commented Apr 16, 2026

Uh oh!

afs commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants