Skip to content

(Proposal) Parse option to make sourcepos column char-based #777

@Martin005

Description

@Martin005

As discussed in #762, comrak uses amount of UTF-8 bytes for the value in LineColumn::column.

But if you want to use comrak to pinpoint the exact location in the source file, the user would expect the column to point at the readable character, not at the position of a UTF-8 byte.

When you have a Markdown file that contains only the character (U+597D - 0xE5 0xA5 0xBD in UTF-8 bytes), comrak puts 1:1-1:3 as the sourcepos.

Contrary to that, if you open the same Markdown file in a text editor (e.g. Notepad++), the end column is at 2, and the position is 4. comrak treats the range as end-inclusive, so it puts 1:1-1:3 instead of 1:1-1:4 as the sourcepos.

Image

My proposal is to create a parse option (disabled by default) which, when enabled, would make the column value in LineColumn character-based, not based on UTF-8 bytes.

That would make the Markdown example above have sourcepose 1:1-1:1 (as the end-inclusivity would be maintained).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions