CommonMark issue: commonmark/commonmark-spec#650
The following chapters are written as an amendment to the original CommonMark specification. Missing chapters, sections, and definitions are the same as in the original specification.
A CJK character is a character (Unicode code point) that meets at least one of the following criteria:
- Meets both of the following criteria:
- UAX #11 East Asian Width category is either
W,F, orH - Not default emoji presentation character (substantially equivalent to: is not in RGI emoji set or is not fully-qualified emoji) defined in UTS #51 Unicode Emoji
- UAX #11 East Asian Width category is either
- UAX #24 Unicode Script Property is Hangul
An Ideographic Variation Selector is a character in the Variation Selectors Supplement Block (U+E0100–U+E01EF).
A Non-emoji General-use Variation Selector is a character in the Variation Selectors Block (U+FE00–U+FE0F) other than Emoji Presentation Selector U+FE0F.
A CJK sequence is a CJK character or a sequence of 2 characters where the first one is CJK character and the second one is Non-emoji General-use Variation Selector.
A CJK punctuation character is a Unicode punctuation character that is also a CJK character.
A non-CJK punctuation character is a Unicode punctuation character other than CJK punctuation character.
A Unicode punctuation sequence is a Unicode punctuation character or a sequence of 2 characters where the first one is Unicode punctuation character and the second one is Non-emoji General-use Variation Selector.
A CJK ambiguous punctuation sequence is a Standardized Variation Sequence whose description in StandardizedVariants.txt (the latest version) contains a word "fullwidth form", whose first character is a Unicode punctuation character, and the UAX #11 East Asian Width category of whose first character is A.
Note
- The Sibe form of quotation marks added in Unicode 17 are vertical only and come with a space, so they are not CJK ambiguous punctuation sequences.
A CJK punctuation sequence is a CJK punctuation character, a CJK ambiguous punctuation sequence, or a sequence of 2 characters where the first one is CJK punctuation character and the second one is Non-emoji General-use Variation Selector.
A Non-CJK punctuation sequence is a Non-CJK punctuation character or a sequence of 2 characters where the first one is Non-CJK punctuation character and the second one is Non-emoji General-use Variation Selector.
Note
To see the concrete ranges of each definition, see ranges.md.
Note
The bold italic means the modified part.
A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a non-CJK punctuation character or (2b) followed by a non-CJK punctuation character and preceded by (2bα) Unicode whitespace, (2bβ) a non-CJK punctuation sequence, (2bγ) a CJK sequence, or (2bδ) an Ideographic Variation Selector. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a non-CJK punctuation sequence, or (2b) preceded by a non-CJK punctuation sequence and followed by (2bα) Unicode whitespace, (2bβ) a non-CJK punctuation character, or (2bγ) a CJK character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
Note
If the delimiter run (1) adjoins a Code Unit that is not a part of an Encoded Character/Assigned Character (including Ill-Formed Code Unit Subsequences, e.g. isolated Surrogate Code Points/Units) or (2) is preceded by a Standard Variation Selector that is preceded by (2a) a Unicode whitespace or (2b) an Ideographic Variation Selector, both of whether the delimiter run is left-flanking and whether it is right-flanking are Unspecified.
2. A single _ character can open emphasis iff it is part of a left-flanking delimiter run and either (a) not part of a right-flanking delimiter run or (b) part of a right-flanking delimiter run preceded by a Unicode punctuation sequence.
6. A double __ can open strong emphasis iff it is part of a left-flanking delimiter run and either (a) not part of a right-flanking delimiter run or (b) part of a right-flanking delimiter run preceded by a Unicode punctuation sequence.
See implementers-tips.md.