feat(site): 课程正文数学公式 KaTeX 渲染#25
Merged
Merged
Conversation
上游课程把公式写成 inline code(如 \`h_t = f(h_{t-1}, x_t)\`),
code 样式下下标/希腊字母完全不可读(上游英文站同病)。
渲染层方案,源文档零改动、与上游零漂移:
- looksLikeMath():保守启发式判定数学 span——命中强信号(_{ ^{、
数学 Unicode 符号、^上标)且无代码特征(中文、引号字符串、* 乘法、
snake_case 长下标、-> 等操作符)才转;prime 记法 V(s') 与字符串
字面量按开引号位置区分
- texPreprocess():伪 LaTeX 整理(²³ᵀ→^、组合字符 α̅/V̂→\bar/\hat、
√→\sqrt、多字符上下标补花括号、log/softmax 等函数名 \operatorname)
- renderMathSpans():按需加载 KaTeX 0.16.21(页面含公式才拉 CDN),
逐个 try/catch 渲染,失败回退原 code 样式,绝不变差
验证:431 种全仓数学样本命中 376、渲染成功率 100%;23 种典型代码
span(文件名/f-string/snake_case/含中文)误伤 0;本地实测
07-why-transformers(4 公式)、09-policy-gradients(34 公式)全渲染
零回退;暗色模式颜色随主题正常。
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
审查(全语料 Node+katex 仿真)发现 4 处问题,全部修复:
- looksLikeMath 拒绝含字面 < 的 span:em 正则跨 code 配对的既有
行为会把 <em> 注入 code 内容,* 防线失效(7 处 net 变差)
- ^ 上标信号要求前面有底数字符:拒掉脱敏正则 ^Bearer 等锚点;
字符类保留 | 以免误杀双范数 ||q||^2
- texPreprocess 花括号补全 {2,}→+:修 (Σ_r·Σ_g)^0.5 渲染成
「上标0+基线.5」的错误
- ~→\sim:修 ε ~ N(0,I) 等分布记法在 KaTeX 中不可见的问题
回归:369 命中/369 渲染成功,对抗样本(^Bearer、^1.2.3 semver、
正则锚点、em 污染 span)误伤 0。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
问题
上游课程把数学公式写成 inline code(如 `h_t = f(h_{t-1}, x_t)`),code 样式下下标、希腊字母完全不可读(上游英文站同病)。全仓 503 课散文区有 ~600 处这类伪 LaTeX 公式。
方案(渲染层,源文档零改动)
lesson.html 渲染 inline code 时做保守启发式判定,命中数学模式的 span 改走 KaTeX,其余保持 code 原样:
_{/^{、数学 Unicode 符号、^上标)才进;含中文 / 引号字符串 /*代码乘法 / snake_case 长下标 /->等代码特征一律排除。prime 记法V(s')与字符串字面量按开引号位置区分。²³ᵀ→^、组合字符α̅/V̂→\bar/\hat、√→\sqrt、多字符上下标补花括号、log/softmax等函数名转\operatorname。与上游零漂移:不改任何 docs;上游后续新内容自动受益。此改动成为新的 zh 特化(C 类保护项)。
验证
main.py、f-string、client_id、lr_max、含中文计算式等)误伤 007-why-transformers4 公式、09-policy-gradients-reinforce34 公式全部渲染、0 回退node site/build.js --check通过没验证到的(fail-loud)
y = w^T * x + b含*、ε_uncond长下标),维持 code 原样——不变差但也没变好