Skip to content

[Helpers\repl.nim]: UTF-8 input is broken in linenoise (multi-byte characters corrupted during editing) #2211

@scifx

Description

@scifx

Summary

Arturo using this fork of linenoise in a REPL environment and ran into a serious issue with UTF-8 input.

Current Behavior

When entering multi-byte characters (e.g. Chinese or Korean), the input gets corrupted during editing.

Example:

Input:

a你b

Output:

a`b

The middle character () is not just displayed incorrectly — it is actually lost and replaced, which suggests the UTF-8 sequence is being broken.

Analysis

This appears to happen because input is processed byte-by-byte instead of as UTF-8 codepoints.

For example, the character is encoded as:

E4 BD A0

But the implementation seems to treat each byte as an individual character, causing:

  • partial reads of multi-byte sequences
  • invalid character reconstruction
  • fallback to incorrect ASCII characters (like '`')

So this is not just a rendering issue — it is data corruption during input handling.

Expected Behavior

Proper UTF-8 handling should:

  • read full codepoints (not individual bytes)
  • handle cursor movement based on characters, not bytes
  • avoid breaking multi-byte sequences

Steps To Reproduce

In repl mode
type ”a你b“
you will get "a`b"

OS

all

Version

all

Anything else?

At the moment, this makes the library unusable in UTF-8 environments (which are standard today).

It might be worth:

  • clearly documenting that UTF-8 input is not fully supported
  • considering switching to / referencing a more complete implementation

I discussed with AI, and it seems to have determined that the issue is likely related to the "linenoise" library. The AI suggests that it's cutting characters into bytes, so it recommends using alternative C libraries and projects as replacements.

Thanks!

Is there an existing issue for this?

  • I have searched the existing issues

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions