Preserve synthetic paragraph markers in CommonMark output#788
Preserve synthetic paragraph markers in CommonMark output#788sandmor wants to merge 1 commit intokivikakk:mainfrom
Conversation
|
Did not read the contributing guidelines. |
|
But I read them? What is the issue? Note that for contribution guidelines I only found the contributing section in the README, as pointed by the contributing file |
|
The PR body is LLM-generated. |
|
The tests are indeed, as I usually don't find very gratifying writing text to lock-in behavior. The fix itself though is a heavily modified version of an initial LLM-draft, where in that draft the LLM was trying to use the same logic than the ordered list case—where it would check something like if there is a fence using begin_content. The fix itself is quite simple, just adding escape in case of fence at the start of line, with three leading spaces check as markdown usually interprets anything beyond that as it not being a block starter. If even tests or using a LLM to sound off problems is off, I can send a smaller PR containing only minimal changes if preferred. If you prefer to not count the block starter leading spaces check that can be change too? |
|
Thanks for your response. To be super clear, the contributing part of the README has this subsection:
I certainly don't mind you using LLMs to sound off problems; it's just that all the actual code, tests and documentation submitted to Comrak itself needs to be human-authored. I ask that (per above) the pull requests/issues/comments themselves be so too1. This is why I was quite terse in my above comments. It's worth noting this is a bug in CommonMark roundtripping even with non-synthetic paragraphs: $ printf '\~~~' | comrak
<p>~~~</p>
$ printf '\~~~' | comrak -t commonmark
~~~
$ printf '\~~~' | comrak -t commonmark | comrak
<pre style="background-color:#2b303b;"><code></code></pre>
$It's not a problem with $ printf '\```' | comrak
<p>```</p>
$ printf '\```' | comrak -t commonmark
\`\`\`
$ printf '\```' | comrak -t commonmark | comrak
<p>```</p>
$(This approach and its limitations are inherited from A simpler solution might be simply to escape At any rate, I am happy to accept a completely human-authored PR that improves upon this situation to any degree 🤍 Footnotes
|
Summary
Preserve block-like text as plain text when formatting synthetic
Paragraphnodes to CommonMark.Before this change, a synthetic paragraph containing text like
~~~, ``````,:::, `: details`, or `~ details` could be emitted as real Markdown block markers, so reparsing the formatter output changed the document structure.Why
The formatter should preserve AST semantics.
If Markdown is parsed from source text, fences and other block markers should still be recognized normally. But if the formatter is given a synthetic
Paragraphnode with that same text, it should render it as plain text, not as a new Markdown block.Example
Before:
synthetic Paragraph("~~~") -> ~~~Reparsing that output turns the paragraph into a fenced block.
After:
synthetic Paragraph("~~~") -> \~~~The text stays a paragraph after round-trip.
What Changed
1~~~is not over-escapedTesting
cargo fmt --checkcargo test commonmark_ -- --nocapture