-
-
Notifications
You must be signed in to change notification settings - Fork 179
(Proposal) Parse option to make sourcepos column char-based #777
Description
As discussed in #762, comrak uses amount of UTF-8 bytes for the value in LineColumn::column.
But if you want to use comrak to pinpoint the exact location in the source file, the user would expect the column to point at the readable character, not at the position of a UTF-8 byte.
When you have a Markdown file that contains only the 好 character (U+597D - 0xE5 0xA5 0xBD in UTF-8 bytes), comrak puts 1:1-1:3 as the sourcepos.
Contrary to that, if you open the same Markdown file in a text editor (e.g. Notepad++), the end column is at 2, and the position is 4. comrak treats the range as end-inclusive, so it puts 1:1-1:3 instead of 1:1-1:4 as the sourcepos.
My proposal is to create a parse option (disabled by default) which, when enabled, would make the column value in LineColumn character-based, not based on UTF-8 bytes.
That would make the Markdown example above have sourcepose 1:1-1:1 (as the end-inclusivity would be maintained).