Skip to content

fix: allow JavaScript in archived pages to enable KaTeX/MathJax rendering#2541

Open
idiottrader wants to merge 1 commit intokarakeep-app:mainfrom
idiottrader:main
Open

fix: allow JavaScript in archived pages to enable KaTeX/MathJax rendering#2541
idiottrader wants to merge 1 commit intokarakeep-app:mainfrom
idiottrader:main

Conversation

@idiottrader
Copy link
Copy Markdown

Fixes #1243

Problem

Mathematical expressions weren't rendering in reader view because the monolith archiving tool was stripping JavaScript from archived pages.

Root Cause

In crawlerWorker.ts, the monolith command was called with -Ije flags where -j means "No JavaScript". Since MathJax and KaTeX require JavaScript to render math expressions, the math was not being rendered.

Solution

Changed the monolith arguments from ["-", "-Ije", ...] to ["-", "-Ie", ...] - removing the -j flag while keeping -I (isolate) and -e (no embeds) for security.

Testing

This fix should be tested with: https://transformer-circuits.pub/2025/attribution-graphs/methods.html

/claim #1243

…ring

Remove the -j flag from monolith arguments to preserve JavaScript
in archived pages. This fixes issue karakeep-app#1243 where mathematical
expressions weren't rendering in reader view.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43e9ea0 and b6e75c7.

📒 Files selected for processing (1)
  • apps/workers/workers/crawlerWorker.ts

Walkthrough

Modified the archiveWebpage subprocess invocation in crawlerWorker.ts by replacing the "-Ije" command-line flag with "-Ie" while preserving all other arguments unchanged.

Changes

Cohort / File(s) Summary
Archive subprocess flags
apps/workers/workers/crawlerWorker.ts
Changed archiveWebpage subprocess command flag from "-Ije" to "-Ie" while keeping all other arguments ("-", "-t 5", "-b", url, "-o", assetPath) intact.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: allowing JavaScript in archived pages to enable KaTeX/MathJax rendering, which directly matches the code change of removing the -j flag from monolith arguments.
Description check ✅ Passed The description is well-written, clearly explaining the problem (math expressions not rendering), root cause (-j flag disables JavaScript), and the solution (removing -j while keeping -I and -e flags), all directly related to the code changes.
Linked Issues check ✅ Passed The code change directly addresses issue #1243 by removing the -j flag that was preventing JavaScript execution needed for KaTeX/MathJax rendering in archived pages.
Out of Scope Changes check ✅ Passed The single line change to replace -Ije with -Ie in crawlerWorker.ts is narrowly scoped and directly addresses the linked issue objective of enabling math rendering in archived pages.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 1, 2026

Greptile Summary

This PR removes the -j (no JavaScript) flag from the monolith archiving command to enable mathematical expression rendering via KaTeX/MathJax in archived pages.

Key Changes:

  • Modified monolith flags from -Ije to -Ie, preserving JavaScript while keeping isolation (-I) and no embeds (-e) flags

Critical Security Concern:
The archived HTML content is rendered using dangerouslySetInnerHTML in BookmarkHTMLHighlighter.tsx without sanitization. Allowing JavaScript in archived pages creates a stored XSS vulnerability where malicious scripts from archived websites could execute in users' browsers. While the -I isolation flag provides some protection against external resource loading, it doesn't prevent inline JavaScript execution.

Recommendations:

  1. Implement DOMPurify sanitization on archived HTML before rendering
  2. Consider rendering archived content in a sandboxed iframe with strict CSP
  3. Alternatively, use a whitelist approach that only allows specific math rendering libraries

Confidence Score: 2/5

  • This PR introduces a stored XSS security risk by enabling JavaScript in archived pages without sanitization
  • While the change correctly addresses the math rendering issue by preserving JavaScript, it creates a security vulnerability. The archived HTML is rendered via dangerouslySetInnerHTML without sanitization, allowing potentially malicious scripts to execute. The implementation needs additional security measures (DOMPurify, sandboxed iframe, or CSP) before being safe to merge.
  • Pay close attention to apps/workers/workers/crawlerWorker.ts and consider reviewing BookmarkHTMLHighlighter.tsx for potential sanitization implementation

Important Files Changed

Filename Overview
apps/workers/workers/crawlerWorker.ts Removed -j flag from monolith to enable JavaScript in archived pages for math rendering. Security concern: archived HTML with JS is rendered via dangerouslySetInnerHTML without sanitization.

Last reviewed commit: b6e75c7

Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

no_proxy: serverConfig.proxy.noProxy?.join(","),
},
})("monolith", ["-", "-Ije", "-t", "5", "-b", url, "-o", assetPath]);
})("monolith", ["-", "-Ie", "-t", "5", "-b", url, "-o", assetPath]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing JavaScript in archived pages creates a stored XSS risk. The archived HTML is later rendered using dangerouslySetInnerHTML in BookmarkHTMLHighlighter.tsx (line 411) without sanitization. While the -I flag provides some isolation, malicious JS from archived pages could still execute in users' browsers and potentially access localStorage, cookies, or make API calls. Consider either:

  1. Using DOMPurify to sanitize the archived HTML before storage/display
  2. Rendering archived content in a sandboxed iframe with strict CSP
  3. Only allowing specific script sources needed for math rendering (e.g., KaTeX/MathJax CDN URLs)
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/workers/workers/crawlerWorker.ts
Line: 1428

Comment:
Allowing JavaScript in archived pages creates a stored XSS risk. The archived HTML is later rendered using `dangerouslySetInnerHTML` in `BookmarkHTMLHighlighter.tsx` (line 411) without sanitization. While the `-I` flag provides some isolation, malicious JS from archived pages could still execute in users' browsers and potentially access localStorage, cookies, or make API calls. Consider either:
1. Using DOMPurify to sanitize the archived HTML before storage/display
2. Rendering archived content in a sandboxed iframe with strict CSP
3. Only allowing specific script sources needed for math rendering (e.g., KaTeX/MathJax CDN URLs)

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[$20 bounty] Bug: reader view and katex / math rendering

1 participant