bahtou/TechCrunch-HomePage-Spider
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Spider that crawls the home page of TechCrunch (http://techcrunch.com/) Scrapy framework is used to scrape information from the homepages of TechCrunch. Data on who posted, posters link, headline, headline link and time posted are extracted. The data is then dumped into MySQLdb. Checkout: http://scrapy.org/