CommonMark CJK-friendly Amendments Specification

CommonMark issue: commonmark/commonmark-spec#650

The following chapters are written as an amendment to the original CommonMark specification. Missing chapters, sections, and definitions are the same as in the original specification.

2. Preliminaries

2.1 Characters and lines

A CJK character is a character (Unicode code point) that meets at least one of the following criteria:

Meets both of the following criteria:
- UAX #11 East Asian Width category is either W, F, or H
- Not default emoji presentation character (substantially equivalent to: is not in RGI emoji set or is not fully-qualified emoji) defined in UTS #51 Unicode Emoji
UAX #24 Unicode Script Property is Hangul

An Ideographic Variation Selector is a character in the Variation Selectors Supplement Block (U+E0100–U+E01EF).

A Non-emoji General-use Variation Selector is a character in the Variation Selectors Block (U+FE00–U+FE0F) other than Emoji Presentation Selector U+FE0F.

A CJK sequence is a CJK character or a sequence of 2 characters where the first one is CJK character and the second one is Non-emoji General-use Variation Selector.

A CJK punctuation character is a Unicode punctuation character that is also a CJK character.

A non-CJK punctuation character is a Unicode punctuation character other than CJK punctuation character.

A Unicode punctuation sequence is a Unicode punctuation character or a sequence of 2 characters where the first one is Unicode punctuation character and the second one is Non-emoji General-use Variation Selector.

A CJK ambiguous punctuation sequence is a Standardized Variation Sequence whose description in StandardizedVariants.txt (the latest version) contains a word "fullwidth form", whose first character is a Unicode punctuation character, and the UAX #11 East Asian Width category of whose first character is A.

Note

The Sibe form of quotation marks added in Unicode 17 are vertical only and come with a space, so they are not CJK ambiguous punctuation sequences.

A CJK punctuation sequence is a CJK punctuation character, a CJK ambiguous punctuation sequence, or a sequence of 2 characters where the first one is CJK punctuation character and the second one is Non-emoji General-use Variation Selector.

A Non-CJK punctuation sequence is a Non-CJK punctuation character or a sequence of 2 characters where the first one is Non-CJK punctuation character and the second one is Non-emoji General-use Variation Selector.

Note

To see the concrete ranges of each definition, see ranges.md.

6. Inlines

6.2 Emphasis and strong emphasis

Note

The bold italic means the modified part.

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a non-CJK punctuation character or (2b) followed by a non-CJK punctuation character and preceded by (2bα) Unicode whitespace, (2bβ) a non-CJK punctuation sequence, (2bγ) a CJK sequence, or (2bδ) an Ideographic Variation Selector. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a non-CJK punctuation sequence, or (2b) preceded by a non-CJK punctuation sequence and followed by (2bα) Unicode whitespace, (2bβ) a non-CJK punctuation character, or (2bγ) a CJK character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

Note

If the delimiter run (1) adjoins a Code Unit that is not a part of an Encoded Character/Assigned Character (including Ill-Formed Code Unit Subsequences, e.g. isolated Surrogate Code Points/Units) or (2) is preceded by a Standard Variation Selector that is preceded by (2a) a Unicode whitespace or (2b) an Ideographic Variation Selector, both of whether the delimiter run is left-flanking and whether it is right-flanking are Unspecified.

2. A single _ character can open emphasis iff it is part of a left-flanking delimiter run and either (a) not part of a right-flanking delimiter run or (b) part of a right-flanking delimiter run preceded by a Unicode punctuation sequence.

6. A double __ can open strong emphasis iff it is part of a left-flanking delimiter run and either (a) not part of a right-flanking delimiter run or (b) part of a right-flanking delimiter run preceded by a Unicode punctuation sequence.

Tips for Implementers

See implementers-tips.md.

Unicode data list

Data name	Latest	Unicode 17
East Asian Width	https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt	https://www.unicode.org/Public/17.0.0/ucd/EastAsianWidth.txt
Script	https://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt	https://www.unicode.org/Public/17.0.0/ucd/Scripts.txt
Block	https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt	https://www.unicode.org/Public/17.0.0/ucd/Blocks.txt
Characters followed by Non-emoji General-use Variation Selector Variation Selector	https://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt	https://www.unicode.org/Public/17.0.0/ucd/StandardizedVariants.txt
Default emoji presentation characters	https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt	https://www.unicode.org/Public/17.0.0/ucd/emoji/emoji-data.txt
Characters followed by U+FE0E/U+FE0F	https://unicode.org/Public/UCD/latest/ucd/emoji/emoji-variation-sequences.txt	https://unicode.org/Public/17.0.0/ucd/emoji/emoji-variation-sequences.txt
Fully-qualified Emojis (without ZWJ)	https://unicode.org/Public/emoji/latest/emoji-sequences.txt	https://unicode.org/Public/17.0.0/emoji/emoji-sequences.txt
Emoji qualification test	https://unicode.org/Public/emoji/latest/emoji-test.txt	https://unicode.org/Public/17.0.0/emoji/emoji-test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CommonMark CJK-friendly Amendments Specification

2. Preliminaries

2.1 Characters and lines

6. Inlines

6.2 Emphasis and strong emphasis

Tips for Implementers

Unicode data list

FilesExpand file tree

specification.md

Latest commit

History

specification.md

File metadata and controls

CommonMark CJK-friendly Amendments Specification

2. Preliminaries

2.1 Characters and lines

6. Inlines

6.2 Emphasis and strong emphasis

Tips for Implementers

Unicode data list