Skip to content

Commit b94b1e3

Browse files
committed
fix: preserve line breaks in pre tags
Line breaks were being lost when processing pre tags due to improper text content extraction. This fix implements a custom logic that preserves line breaks by properly handling br tags.
1 parent 3b6afde commit b94b1e3

File tree

1 file changed

+18
-2
lines changed

1 file changed

+18
-2
lines changed

src/scraper/processor/HtmlProcessor.ts

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,24 @@ export class HtmlProcessor implements ContentProcessor {
112112
}
113113
}
114114

115-
// use `node.textContent` to avoid escaping
116-
return `\n\`\`\`${language}\n${node.textContent}\n\`\`\`\n`;
115+
// We cannot use `content` here as it will escape the content.
116+
// We also cannot use `element.textContent` as it will not preserve the line breaks.
117+
// Instead, we implement a custom logic to extract the text content.
118+
const text = (() => {
119+
// Clone the node to avoid modifying the original
120+
const clone = element.cloneNode(true) as HTMLElement;
121+
122+
// Replace <br> tags with newline characters
123+
const brElements = Array.from(clone.querySelectorAll("br"));
124+
for (const br of brElements) {
125+
br.replaceWith("\n");
126+
}
127+
128+
// Get the text content after replacing <br> tags
129+
return clone.textContent;
130+
})();
131+
132+
return `\n\`\`\`${language}\n${text}\n\`\`\`\n`;
117133
},
118134
});
119135

0 commit comments

Comments
 (0)