- CJK character contains the following characters:
- 〰 (U+3030)
- 〽 (U+303D)
- 🈂 (U+1F202)
- 🈷 (U+1F237)
- ㊗ (U+3297)
- ㊙ (U+3299)
- Do not treat every character in emoji-data.txt in the below data list as emoji. It includes ASCII digits, ASCII asterisk, ASCII hash sign, copyright symbol, trademark symbol, and so on. They should not be treated as emoji unless followed by a U+FE0F. We have to extract only characters with the
Emoji_Presentationlabel. - You can use
/^\p{Emoji_Presentation}/u, or/^\p{Basic_Emoji}/vor/^\p{RGI_Emoji}/vin JavaScript to check if a code point is an emoji (as a default emoji presentation character or in the RGI emoji set).RGI_Emojicharacters other thanBasic_Emoji(basic emoji set) have multiple code points and are not CJK as of Unicode 17. Never use/^\p{Emoji}/uinstead of them because it is useless due to the fact that/^\p{Emoji}/u.test("1")istrue(who on earth would insist that1is an emoji?). Thevflag is available since ES2024 and supported by Node >= 20, Chrome (Edge) >= 112, Firefox >= 116, and Safari >= 17."ES2024"as"target"and"lib"intsconfig.jsonis supported by TypeScript >= 5.7, Vite >= 6, and Vitest >= 3. You should use"ESNext"instead of"ES2024"for older ecosystems.
- There are no emojis whose East Asian Width is
ForHas of Unicode 17. - The East Asian Width of Ideographic Variation Selector and Standard Variation Selector is
A. - The East Asian Width of characters whose Script is Hangul can be
N(U+1160–U+11FF). However, there are no characters whose Script is Hangul and East Asian Width isAorNaas of Unicode 17. - You can use
/^\p{sc=Hangul}/uin JavaScript to check if the Script of a character is Hangul. - The East Asian Width of unassigned characters (e.g. U+3097) is undefined. You should follow the guideline by Unicode. Note that U+2FFFE–U+2FFFF and U+2FFFE–U+2FFFF are Noncharacter, not Reserved (Unassigned). The East Asian Width of Noncharacter does not seem to be mentioned in the specifications of the East Asian Width property. Therefore, you can treat them as
Wto join two product terms for U+20000–U+2FFFD and U+30000–U+3FFFD. - The Unicode category of Ideographic Variation Selector and Standard Variation Selector is
Mn, notPorS. It means there is no Unicode punctuation character or non-CJK punctuation character that is also Standard Variation Selector or Ideographic Variation Selector. - You do not have to care about the existence of continuous Standard Variation Selector or Ideographic Variation Selector, or Ideographic Variation Selector preceded by
*. It is up to you implementers to decide how to treat them.
| CM | Out | M | In | End | Start |
|---|---|---|---|---|---|
| ✅️ | ␣ |
* |
␣ |
❌️ | ❌️ |
| ✅️ | ␣ |
* |
" |
❌️ | ✅️ |
| ✅️ | ␣ |
* |
a |
❌️ | ✅️ |
| ❌️ | ␣ |
* |
~ |
❌️ | ✅️ |
| ❌️ | ␣ |
* |
字 |
❌️ | ✅️ |
| ✅️ | ␣ |
_ |
␣ |
❌️ | ❌️ |
| ✅️ | ␣ |
_ |
" |
❌️ | ✅️ |
| ✅️ | ␣ |
_ |
a |
❌️ | ✅️ |
| ❌️ | ␣ |
_ |
~ |
❌️ | ✅️ |
| ❌️ | ␣ |
_ |
字 |
❌️ | ✅️ |
| ✅️ | " |
* |
" |
✅️ | ✅️ |
| ✅️ | " |
* |
a |
❌️ | ✅️ |
| ❌️ | " |
* |
字 |
✅️ | ✅️ |
| ❌️ | ~ |
* |
a |
✅️ | ✅️ |
| ✅️ | " |
_ |
" |
✅️ | ✅️ |
| ✅️ | " |
_ |
a |
❌️ | ✅️ |
| ❌️ | " |
_ |
字 |
❌️ | ✅️ |
| ❌️ | ~ |
_ |
a |
❌️ | ✅️ |
| ✅️ | a |
* |
a |
✅️ | ✅️ |
| ✅️ | a |
_ |
a |
❌️ | ❌️ |
Legend:
| Symbol/Term | Meaning |
|---|---|
| CM | ✅️ = CommonMark, ❌️ = CJK Friendly Emphasis only |
| Out | Character outside (before the marker) |
| M | Marker symbol |
| In | Character inside (after the marker) |
| End | Whether the marker can end a range |
| Start | Whether the marker can start a range |
| Character | Meaning | Alt Punctuation |
|---|---|---|
␣ |
Unicode whitespace character | N/A |
" |
Non-CJK punctuation (in CommonMark, includes CJK characters) | (, ) |
a |
Non-CJK character (in CommonMark, includes CJK characters) | N/A |
~ |
CJK punctuation | 「, 」 |
字 |
CJK character |
The above table implies:
- If a marker is adjacent to Unicode whitespace characters, whether the marker can start/end a range does not depend on whether the adjacent characters are punctuation or CJK characters.
- If a marker is
_, whether the marker can start/end a range does not depend on whether the adjacent characters are CJK characters.