fix: add support for multibyte characters by Haroenv · Pull Request #2 · algolia/chunk-text

Haroenv · 2017-06-26T07:42:18Z

if the length of a character is more than one (here tested with 2 and 7), it should be counted as that length, but definitely shouldn't cut in the middle of that character.
This does not work as expected yet, so the test is skipped

if the length of a character is more than one (here tested with 2 and 7), it should be counted as that length, but definitely shouldn't cut in the middle of that character. This does not work as expected yet, so the test is skipped

rayrutjes · 2017-06-26T08:00:36Z

Nice @Haroenv , feel free to also submit the solution so that we don't skip the test ;)

Haroenv · 2017-06-26T08:15:37Z

Couldn't simply find a rule for how to get the actual characters to not split them. Thought to add this test so we know that it should work

Haroenv · 2017-06-26T09:34:15Z

The difference now is that the character count is used, and not the string length. This means that 💩 counts as one character, while before it would count as two.

rayrutjes · 2017-06-26T09:45:51Z

The difference now is that the character count is used, and not the string length. This means that 💩 counts as one character, while before it would count as two.

This is not a big issue.

Haroenv · 2017-06-26T09:48:15Z

I think this can be merged as-is, it will no longer break 💩 or 🧀 , but will still break (possibly) 🙌🏿 or 🏃🏿‍♀️

Haroenv · 2017-06-26T10:05:58Z

(by the way, ZWJ is also relevant for Arabic, and multibyte strings for some Chinese letters)

Haroenv · 2017-06-26T10:09:12Z

  const chunks = [];
-  while (text.length > chunkSize) {
-    const splitAt = text.lastIndexOf(' ', chunkSize);
+  let characters = [...text];


I just realised that it matters where this is supported: node 6+

If it would be Array.from() it would be node 4+

Fine for me.

rayrutjes · 2017-06-26T10:12:56Z

We are using conventional commit format to generate changelog.
Could you rebase to add the correct fix: add support for multibyte characters?

Haroenv · 2017-06-26T10:13:49Z

I usually squash on merge, but if you prefer to do it this way, it's also fine for me

Haroenv · 2017-06-27T09:08:51Z

Runes is amazing, it works exactly how I want it!

https://github.com/dotcypress/runes

rayrutjes

Impressive man! Go for it 🎉
Make sure you give it a "conventional" commit name.

Haroenv · 2017-06-27T09:35:59Z

There are some more advanced bugs, but I issued them to runes itself, will see what I can do to fix it. For now it's already going well

test(unicode): add tests for complicated characters

76bbde4

if the length of a character is more than one (here tested with 2 and 7), it should be counted as that length, but definitely shouldn't cut in the middle of that character. This does not work as expected yet, so the test is skipped

use an array to have single characters

d48a950

add another failing case

45db5e9

Haroenv commented Jun 26, 2017

View reviewed changes

comp++

a4180c6

Haroenv changed the title ~~test(unicode): add tests for complicated characters~~ fix: add support for multibyte characters Jun 26, 2017

fix tests by using runes

a6d8d84

rayrutjes approved these changes Jun 27, 2017

View reviewed changes

Haroenv merged commit 1398956 into algolia:master Jun 27, 2017

Haroenv deleted the test/complicated-characters branch June 27, 2017 09:36

Conversation

Haroenv commented Jun 26, 2017

Uh oh!

rayrutjes commented Jun 26, 2017

Uh oh!

Haroenv commented Jun 26, 2017

Uh oh!

Haroenv commented Jun 26, 2017

Uh oh!

rayrutjes commented Jun 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Haroenv commented Jun 26, 2017

Uh oh!

Haroenv commented Jun 26, 2017

Uh oh!

Haroenv Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

rayrutjes Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

rayrutjes commented Jun 26, 2017

Uh oh!

Haroenv commented Jun 26, 2017

Uh oh!

Haroenv commented Jun 27, 2017

Uh oh!

rayrutjes left a comment

Choose a reason for hiding this comment

Uh oh!

Haroenv commented Jun 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rayrutjes commented Jun 26, 2017 •

edited

Loading