Version: 1.0
Author: Linus Stoltz
A simple web scraper to identify key words for a given url. Allows various configuration parameters. Exports structured JSON for each scrape event.
- Python 3.7+
- requests
- beautifulsoup4
- lxml
You can install all dependencies with:
pip install -r requirements.txtpython scrape_webpage.py https://example.com [OPTIONS]| Options | Description |
|---|---|
-k, --keywords |
One or more keywords/phrases to search for (multi‑word phrases in quotes). separate each entry with a space. |
-f, --keywords-file |
Path to a text file with one keyword or phrase per line (mutually exclusive with -k flag). |
-d, --depth |
Maximum crawl depth (0 = only the start page). Defaults to 1. |
-t, --timeout |
HTTP request timeout in seconds. Defaults to 10. |
--json |
Path to output JSON file. Defaults to <timestamp>_matches.json. |