WB Product Scraper

A small Python project for scraping product cards from a Wildberries catalog page and saving results to CSV.

Current behavior:

opens a catalog page in Playwright;
scrolls the page a specified number of times;
extracts id, brand (name), price, and image link;
removes duplicates by id;
saves output to data/products.csv.

Tech Stack

Python 3.10+
Playwright (browser automation)
BeautifulSoup4 (HTML parsing)

Project Structure

scraper/
	main.py           # entry point
	run_browser.py    # browser launch, scrolling, HTML retrieval
	parser.py         # data extraction from HTML
	models.py         # Product model
	preservation.py   # deduplication and CSV writing
data/
	products.csv      # scraping result

Installation

Go to the project folder.
Create and activate a virtual environment.
Install dependencies.
Install a browser for Playwright.

Example for macOS/Linux:

python3 -m venv venv
source venv/bin/activate
pip install playwright beautifulsoup4
playwright install chromium

Run

Run from the project root as a module:

python -m scraper.main

After execution, the output file will be available at data/products.csv.

Parsed Fields

Each product card includes the following fields:

id
name (brand)
price
img (image URL)

URL and Scroll Configuration

By default, URL and number of scrolls are set in scraper/main.py:

url = "https://www.wildberries.ge/catalog/obuv/muzhskaya/kedy-i-krossovki"
products = run(url=url, scrolls=1)

You can change url and scrolls as needed.

Notes and Limitations

Website markup can change, so selectors in parser.py may need updates.
With headless=True, the browser runs without UI. For debugging, you can temporarily set headless=False in run_browser.py.
Deduplication is done by id; if a product has no id, it may be treated as a separate record.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scraper		scraper
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WB Product Scraper

Tech Stack

Project Structure

Installation

Run

Parsed Fields

URL and Scroll Configuration

Notes and Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WB Product Scraper

Tech Stack

Project Structure

Installation

Run

Parsed Fields

URL and Scroll Configuration

Notes and Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages