A high-performance web scraper for harvesting tweet data from X (formerly Twitter) using Playwright. This is a project of one of my client. Actually thoughout the project I had to make different versions according to the client's need. You can choose your own version with sync, async, or GUI-based scraping!

- Multiple Scraping Modes: Sync, Async, or GUI-based interfaces
- Parallel Processing: Scrape multiple queries and search modes simultaneously, increasing scraping speed by 4x plus different mode given more data.
- Smart Deduplication: Automatically removes duplicate posts from results
- Infinite Scroll Handling: Intelligently scrolls through feed, this version smoothly handles the DOM deletion process and detects when no new content loads
- Resource Optimization: Blocks unnecessary resources (images, videos, fonts) for faster scraping
- User Verification Detection: Identifies verified accounts (blue checkmarks)
- Beautiful GUI: Desktop application with progress tracking and real-time status updates for non-technical users.
- Python 3.8+
- Playwright (with Chromium browser)
- customtkinter (for GUI)
-
Clone or download this project
cd Twitter\ Scraper
-
Install dependencies
pip install playwright customtkinter
-
Install Playwright browsers
playwright install chromium
⚠️ Important: You need valid X/Twitter authentication tokens to use this scraper.
Edit the authentication section in Scraper.py or Scraper V1.py:
await context.add_cookies([
{
"name": "auth_token",
"value": "YOUR_AUTH_TOKEN_HERE",
"domain": ".x.com",
"path": "/",
"secure": True,
"httpOnly": True
},
{
"name": "ct0",
"value": "YOUR_CT0_TOKEN_HERE",
"domain": ".x.com",
"path": "/",
"secure": True
}
])To find your tokens, check your browser's cookies while logged into X. (Right-click → Inspect → Application/Storage → Cookies)
python GUI.pyPerfect for those who prefer clicking buttons over terminal commands!
Features:
- Enter search queries (one per line)
- Adjust max scroll depth
- Real-time progress tracking
- See total unique users found
python Scraper.pyEdit the queries list in the __main__ section:
queries = [
"Physics and astronomy",
"Astrophysics",
"Quantum mechanics",
"Cosmology"
]Why async? It's blazingly fast! Uses async/await to handle multiple requests concurrently.

python main.pyThis wraps Scraper.py and runs all queries in parallel. Best for bulk scraping.

python Scraper\ V1.pyThe original synchronous version. Slower but simpler to understand and debug. Great for learning!
All scrapers return data in the following structure:
{
'Usernames': ['/username1', '/username2', ...],
'DisplayNames': ['Display Name 1', 'Display Name 2', ...],
'VerifiedStatus': [True, False, ...]
}Change the max_scrolls parameter to control how deep the scraper goes. But it'll make it slower!:
# Shallow scrape (fast)
results = asyncio.run(run_all_queries(queries, max_scrolls=3))
# Deep scrape (thorough)
results = asyncio.run(run_all_queries(queries, max_scrolls=50))To load images, videos, etc. (slower but more thorough), modify in Scraper.py:
# Remove or modify this route handler
await page.route("**/*", lambda route: (
route.abort()
if route.request.resource_type in ["image", "video", "font", "stylesheet"]
else route.continue_()
))| Method | Speed | Scalability | Ease of Use |
|---|---|---|---|
| GUI | Medium | Single query | ⭐⭐⭐⭐⭐ |
| Sync (V1) | Slow | Limited | ⭐⭐⭐ |
| Async (Main) | Fast | Multi-query | ⭐⭐⭐⭐ |
| Direct Python | Fastest | Highly scalable | ⭐⭐ |
"No new tweets loading" warning appears immediately
- The feed might be empty or authentication expired
- Check your auth tokens are valid
- Try a different search query
Scraper runs but returns empty results
- Verify you're logged into X in your browser
- Check that
auth_tokenandct0values are correct - Ensure Playwright can access the page (network/firewall issues?)
"Chromium not found" error
- Run:
playwright install chromium
GUI doesn't start
- Ensure
customtkinteris installed:pip install customtkinter
When adding screenshots, consider including:
- GUI Interface - Main application window with search queries entered
- Terminal Output - Console showing real-time scraping progress (scrolls increasing, unique users count)
- Results Example - Display of collected usernames, display names, and verification status
- Multiple Queries Running - Terminal showing async processing of multiple queries in parallel
- Statistics Dashboard - Final results summary showing total unique users found
This tool is for educational and research purposes only. Ensure you comply with:
- X/Twitter's Terms of Service
- Local laws regarding web scraping
- Rate limiting and responsible usage
Always respect the platform's data and privacy policies.
Twitter Scraper/
├── GUI.py # Desktop application interface
├── Scraper.py # Async version (recommended)
├── Scraper V1.py # Sync version (legacy)
├── main.py # Multi-query runner
├── README.md # This file
└── __pycache__/ # Python cache
Found a bug or have an idea? Feel free to improve this project!
Happy(Actually it took a lot of headace 😅) Scraping! 🚀 (Responsibly, of course!)
Contact me: if you need a custom version, feel free to contact me.