feat: add fastCRW tool#1967
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more โ On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
๐ WalkthroughWalkthroughAdds a new ChangesfastCRW Built-in Tool
Estimated code review effort๐ฏ 2 (Simple) | โฑ๏ธ ~10 minutes Suggested labels
Poem
๐ฅ Pre-merge checks | โ 4 | โ 1โ Failed checks (1 warning)
โ Passed checks (4 passed)
โ๏ธ Tip: You can configure your own custom pre-merge checks in the settings. โจ Finishing Touches๐งช Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first โ incorporate their findings. Review areas:
|
There was a problem hiding this comment.
Code Review
This pull request introduces the fastCRW tool (providing crwScrape and crwCrawl) for web scraping and crawling, leveraging the @mendable/firecrawl-js library. The tool is integrated into the built-in tools registry. Feedback on the implementation suggests using optional chaining when accessing properties on the results of scrapeUrl and crawlUrl to prevent potential runtime TypeErrors if the API calls fail or return undefined responses.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| const result = await app.scrapeUrl(input.url, config); | ||
| return { | ||
| content: result.data?.markdown || '', | ||
| markdown: result.data?.markdown, | ||
| html: result.data?.html, | ||
| links: result.data?.links, | ||
| metadata: result.data?.metadata, | ||
| }; |
There was a problem hiding this comment.
If the scrapeUrl call fails or returns an empty/undefined response, accessing result.data directly will throw a TypeError. Using optional chaining on result ensures defensive programming and prevents potential runtime crashes.
| const result = await app.scrapeUrl(input.url, config); | |
| return { | |
| content: result.data?.markdown || '', | |
| markdown: result.data?.markdown, | |
| html: result.data?.html, | |
| links: result.data?.links, | |
| metadata: result.data?.metadata, | |
| }; | |
| const result = await app.scrapeUrl(input.url, config); | |
| return { | |
| content: result?.data?.markdown || '', | |
| markdown: result?.data?.markdown, | |
| html: result?.data?.html, | |
| links: result?.data?.links, | |
| metadata: result?.data?.metadata, | |
| }; |
| const result = await app.crawlUrl(input.url, config); | ||
| return { | ||
| pages: (result.data || []).map(page => ({ | ||
| url: page.url, | ||
| content: page.markdown || '', | ||
| markdown: page.markdown, | ||
| })), | ||
| }; |
There was a problem hiding this comment.
If the crawlUrl call fails or returns an empty/undefined response, accessing result.data directly will throw a TypeError. Using optional chaining on result ensures defensive programming and prevents potential runtime crashes.
| const result = await app.crawlUrl(input.url, config); | |
| return { | |
| pages: (result.data || []).map(page => ({ | |
| url: page.url, | |
| content: page.markdown || '', | |
| markdown: page.markdown, | |
| })), | |
| }; | |
| const result = await app.crawlUrl(input.url, config); | |
| return { | |
| pages: (result?.data || []).map(page => ({ | |
| url: page.url, | |
| content: page.markdown || '', | |
| markdown: page.markdown, | |
| })), | |
| }; |
Greptile SummaryThis PR adds
Confidence Score: 5/5Additive integration that touches no existing code paths โ safe to merge. All four changed files are purely additive. The implementation is a faithful structural clone of the existing Firecrawl tool, with the only substantive difference being the apiUrl constructor argument that redirects requests to fastCRW. No existing behaviour is modified, and the new code follows every convention already established in the codebase. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent
participant tools.ts
participant crw.ts
participant loadCrwPackage
participant firecrawl_js as @mendable/firecrawl-js
participant fastCRW as fastCRW API
Agent->>tools.ts: tools.crwScrape(config) / tools.crwCrawl(config)
tools.ts->>crw.ts: crwScrape(config) / crwCrawl(config)
crw.ts-->>Agent: PraisonTool (execute closure)
Agent->>crw.ts: "tool.execute({ url })"
crw.ts->>loadCrwPackage: loadCrwPackage()
loadCrwPackage->>loadCrwPackage: check CRW_API_KEY env var
alt CRW_API_KEY missing
loadCrwPackage-->>crw.ts: throw MissingEnvVarError
end
loadCrwPackage->>firecrawl_js: "dynamic import('@mendable/firecrawl-js')"
alt package missing
loadCrwPackage-->>crw.ts: throw MissingDependencyError
end
firecrawl_js-->>loadCrwPackage: pkg
loadCrwPackage-->>crw.ts: pkg
crw.ts->>crw.ts: resolve FirecrawlApp constructor
crw.ts->>firecrawl_js: "new FirecrawlApp({ apiKey, apiUrl })"
firecrawl_js-->>crw.ts: app
alt crwScrape
crw.ts->>fastCRW: app.scrapeUrl(url, config)
fastCRW-->>crw.ts: "{ data: { markdown, html, links, metadata } }"
crw.ts-->>Agent: CrwScrapeResult
else crwCrawl
crw.ts->>fastCRW: app.crawlUrl(url, config)
fastCRW-->>crw.ts: "{ data: [{ url, markdown }] }"
crw.ts-->>Agent: CrwCrawlResult
end
Reviews (2): Last reviewed commit: "docs: add JSDoc to crwScrape and crwCraw..." | Re-trigger Greptile |
| // fastCRW | ||
| crw: (config?: CrwScrapeConfig) => crwScrape(config), | ||
| crwScrape: (config?: CrwScrapeConfig) => crwScrape(config), | ||
| crwCrawl: (config?: CrwCrawlConfig) => crwCrawl(config), |
There was a problem hiding this comment.
The
crw shorthand alias resolves to crwScrape, not crwCrawl. Since "CRW" strongly implies "crawl", a user who calls tools.crw() will receive a scrape tool and get empty results for multi-page crawl use-cases without any warning. Consider removing the ambiguous alias (keeping the explicit crwScrape and crwCrawl entries) or renaming it to crwScrape to match what it actually does.
| // fastCRW | |
| crw: (config?: CrwScrapeConfig) => crwScrape(config), | |
| crwScrape: (config?: CrwScrapeConfig) => crwScrape(config), | |
| crwCrawl: (config?: CrwCrawlConfig) => crwCrawl(config), | |
| // fastCRW | |
| crwScrape: (config?: CrwScrapeConfig) => crwScrape(config), | |
| crwCrawl: (config?: CrwCrawlConfig) => crwCrawl(config), |
| const FirecrawlApp = (pkg as Record<string, unknown>).default || pkg; | ||
| if (typeof FirecrawlApp === 'function') { | ||
| const app = new (FirecrawlApp as new (options: { apiKey: string; apiUrl: string }) => { | ||
| scrapeUrl: (url: string, options?: CrwScrapeConfig) => Promise<{ data?: { markdown?: string; html?: string; links?: string[]; metadata?: Record<string, unknown> } }>; | ||
| })({ apiKey: process.env.CRW_API_KEY!, apiUrl: process.env.CRW_API_URL || DEFAULT_CRW_API_URL }); | ||
|
|
||
| const result = await app.scrapeUrl(input.url, config); | ||
| return { | ||
| content: result.data?.markdown || '', | ||
| markdown: result.data?.markdown, | ||
| html: result.data?.html, | ||
| links: result.data?.links, | ||
| metadata: result.data?.metadata, | ||
| }; | ||
| } | ||
|
|
||
| return { content: '' }; |
There was a problem hiding this comment.
Silent empty-result on non-constructor export
If @mendable/firecrawl-js is loaded but its export is not a constructor (e.g., due to a future ESM/CJS interop change or version bump), both crwScrape and crwCrawl fall through to return { content: '' } / return { pages: [] } with no error or log. This mirrors the same pattern in firecrawl.ts, but the silent return means callers have no way to distinguish a real empty result from a broken initialisation.
There was a problem hiding this comment.
Actionable comments posted: 2
๐งน Nitpick comments (2)
src/praisonai-ts/src/tools/builtins/crw.ts (2)
116-137: โ๏ธ Poor tradeoffFragile type casting and silent fallback could hide runtime errors.
The code uses unsafe type assertions (lines 120-124) to cast the dynamically imported Firecrawl client, then silently returns
{ content: '' }(line 136) if the export is not a function. This pattern could hide integration issues, API mismatches, or breaking changes in@mendable/firecrawl-js.Consider: (1) add runtime validation of the loaded module's shape; (2) log a warning when falling back to empty results; (3) document the expected Firecrawl API contract in a comment or type guard.
๐ก๏ธ Example: Add logging for silent fallback
} + // Unexpected module shape - log warning and return empty result + console.warn('[crwScrape] `@mendable/firecrawl-js` export is not a constructor function'); return { content: '' }; },๐ค Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai-ts/src/tools/builtins/crw.ts` around lines 116 - 137, The execute function performs unsafe type casting on the dynamically loaded CRW package without validating its actual shape, and silently returns empty content if FirecrawlApp is not a function, which could hide integration issues. Add runtime validation to check that the loaded package (from loadCrwPackage) is indeed a function before attempting to instantiate it, add a warning log message when the fallback condition is triggered (when typeof FirecrawlApp !== 'function'), and document the expected Firecrawl API contract through comments or a type guard to make the integration contract explicit and catch mismatches early.
158-178: โ๏ธ Poor tradeoffFragile type casting and silent fallback could hide runtime errors.
Same issue as in
crwScrape: unsafe type assertions (lines 161-165) followed by a silent fallback to{ pages: [] }(line 177) when the export is not a function. This could mask API compatibility issues or breaking changes.Consider: (1) add runtime validation; (2) log a warning on fallback; (3) document the expected API contract.
๐ก๏ธ Example: Add logging for silent fallback
} + // Unexpected module shape - log warning and return empty result + console.warn('[crwCrawl] `@mendable/firecrawl-js` export is not a constructor function'); return { pages: [] }; },๐ค Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/praisonai-ts/src/tools/builtins/crw.ts` around lines 158 - 178, The execute function in the CRW tool contains unsafe type assertions when loading the Firecrawl package and silently returns empty pages when the loaded export is not a function, which can mask runtime errors or API compatibility issues. Add runtime validation checks to verify that the loaded package has the expected structure (specifically that it exports a function constructor), and when the validation fails or the type check in the typeof FirecrawlApp === 'function' condition returns false, log a warning message that includes details about what went wrong and what was expected. Additionally, consider adding a comment documenting the expected API contract for the Firecrawl package to make the assumptions explicit.
๐ค Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/praisonai-ts/src/tools/builtins/crw.ts`:
- Around line 99-139: The crwScrape function lacks comprehensive JSDoc
documentation. Replace the existing one-line comment with a complete JSDoc block
that includes `@param` documentation for the optional config parameter of type
CrwScrapeConfig, `@returns` documentation describing that it returns a PraisonTool
object with the CrwScrapeInput and CrwScrapeResult types, and an `@example`
section demonstrating how to use the function (importing it, creating a scraper
with config options, executing it with a URL, and accessing the content result).
- Around line 141-179: The crwCrawl function is missing comprehensive JSDoc
documentation. Add a JSDoc comment block above the function definition that
includes a description of what the function does, a `@param` tag documenting the
optional config parameter (describing it as configuration options for the crawl
behavior), a `@returns` tag documenting that it returns a PraisonTool for crawling
websites, and an `@example` tag showing how to use the function with a sample
configuration object and execute call demonstrating the expected usage pattern.
---
Nitpick comments:
In `@src/praisonai-ts/src/tools/builtins/crw.ts`:
- Around line 116-137: The execute function performs unsafe type casting on the
dynamically loaded CRW package without validating its actual shape, and silently
returns empty content if FirecrawlApp is not a function, which could hide
integration issues. Add runtime validation to check that the loaded package
(from loadCrwPackage) is indeed a function before attempting to instantiate it,
add a warning log message when the fallback condition is triggered (when typeof
FirecrawlApp !== 'function'), and document the expected Firecrawl API contract
through comments or a type guard to make the integration contract explicit and
catch mismatches early.
- Around line 158-178: The execute function in the CRW tool contains unsafe type
assertions when loading the Firecrawl package and silently returns empty pages
when the loaded export is not a function, which can mask runtime errors or API
compatibility issues. Add runtime validation checks to verify that the loaded
package has the expected structure (specifically that it exports a function
constructor), and when the validation fails or the type check in the typeof
FirecrawlApp === 'function' condition returns false, log a warning message that
includes details about what went wrong and what was expected. Additionally,
consider adding a comment documenting the expected API contract for the
Firecrawl package to make the assumptions explicit.
๐ช Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
โน๏ธ Review info
โ๏ธ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 854435c4-1c22-4357-aea4-00e00c7db599
๐ Files selected for processing (4)
src/praisonai-ts/src/index.tssrc/praisonai-ts/src/tools/builtins/crw.tssrc/praisonai-ts/src/tools/builtins/index.tssrc/praisonai-ts/src/tools/tools.ts
|
@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding. Phase 1: Review per AGENTS.md
Phase 2: FIX Valid Issues Phase 3: Final Verdict |
|
Claude finished @MervinPraison's task โโ View job Claude Code Architecture ReviewFinal Architecture Review for PR #1967: feat: add fastCRW tool Phase 1: Review Scope Analysis
Phase 2: Issue Analysis & Fixes
Phase 3: Final Verdict
๐ซ Critical Architectural Scope IssueSCOPE MISMATCH IDENTIFIED: I was explicitly instructed to focus ONLY on Python packages ( This PR contains ZERO Python changes - all 4 changed files are TypeScript:
๐ Summary of Previous Reviewer FindingsMultiple reviewers identified valid architectural issues in the TypeScript code: ๐ด Priority Issues (Gemini, CodeRabbit):
๐ก Medium Issues (Greptile): ๐๏ธ Architectural AssessmentThe TypeScript implementation does follow good architectural patterns:
๐ฏ Final VerdictCANNOT IMPLEMENT FIXES - TypeScript changes are outside my designated scope. RECOMMENDATION FOR TYPESCRIPT MAINTAINER:
ARCHITECTURAL APPROVAL: โ The design follows PraisonAI patterns and is safe to merge after addressing the TypeScript-specific issues above. |
What
Adds fastCRW as a web scrape/search provider โ additive only, no existing code touched.
Why
fastCRW is a fully open-source (AGPL) web engine that outperforms Firecrawl on Firecrawl's own benchmark dataset and runs entirely locally without a cloud dependency.
It runs 100% locally, with no cloud gating. Anti-bot/stealth handling, BYO-proxy + rotation, and full JS rendering (Cloudflare challenge bypass, SPA rendering, HTTPโheadlessโproxy fallback ladder) all ship in the open core โ single ~8MB Rust binary, ~6MB RAM. Firecrawl's OSS self-host gates its stealth engine (
fire-engine) behind a cloud-only flag, so it falls back to plain fetch/Playwright and cannot reach bot-protected or JS-heavy sites without a paid cloud subscription.Faster and higher accuracy on Firecrawl's own benchmark. Truth-recall 63.74% vs 56.04%, and faster median latency (p50 ~1.9s vs ~2.3s).
Search is an optimized SearXNG. SearXNG-backed with added reranking and multi-round retrieval.
Because fastCRW is Firecrawl-API-compatible, the integration is a tiny additive diff โ the same wiring pattern used by the existing Firecrawl provider.
Key via
CRW_API_KEY(free tier at https://fastcrw.com/dashboard); self-host base URL supported. I maintain the integration and can provide free credits to evaluate.