Commit d058b48
committed
feat(scraper): Implement local file scraping and refactor strategy pattern
This commit adds local file system scraping capabilities and improves the
overall scraper architecture:
Core Changes:
- Add LocalFileStrategy for handling file:// URLs with directory traversal
- Rename DefaultScraperStrategy to WebScraperStrategy
- Introduce ContentFetcher & ContentProcessor abstractions
- Add HtmlProcessor and MarkdownProcessor implementations
Architecture Improvements:
- Separate content fetching from processing logic
- Add scraping strategy tests with proper mocking
- Update ARCHITECTURE.md with new component documentation
The changes make the scraper more modular and extensible while maintaining
a clean separation of concerns. Local file system scraping now works with
both HTML and Markdown files, using the same content processing pipeline
as web content.1 parent 59b4a33 commit d058b48
File tree
32 files changed
+1893
-1524
lines changed- src/scraper
- fetcher
- processor
- strategies
32 files changed
+1893
-1524
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
30 | 35 | | |
31 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
32 | 66 | | |
33 | 67 | | |
34 | 68 | | |
| |||
138 | 172 | | |
139 | 173 | | |
140 | 174 | | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
0 commit comments