Scrapy Architecture
Creating a Spider.
Spiders are classes that you define that Scrapy uses to scrape(extract) information from a website(s).
import scrapy class QuoteSpider(scrapy.Spider): name = "quote" start_urls = [ 'https://bluelimelearning.github.io/my-fav-quotes/' ] def parse(self, response): for quote in response.css('div.quotes'): yield{ 'quote':quote.css('p.aquote::text').extract(), 'author':quote.css('p.author::text').extract_first(), }
Running your spider and saving scrapped data.
scrapy runspider quotes_spiders.py -o quotes.xml
https://www.cleancss.com/strip-xml/
Scraping data with Scrapy Shell
scrapy shell "https://bluelimelearning.github.io/my-fav-quotes/"
response.css('title')
response.css('title::text').extract()
response.css('h1::text').extract()
quote = response.css("div.quotes")[0] aquote = quote.css("p.aquote::text").extract() aquote