本文介绍了Scrapy 只返回第一个结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试从 gelbeseiten.de(德国的黄页)中抓取数据
I'm trying to scrape data from gelbeseiten.de (yellow pages in germany)
# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider
from scrapy.http import Request
from scrapy.selector import Selector
from scrapy.http import HtmlResponse
class GelbeseitenSpider(scrapy.Spider):
name = "gelbeseiten"
allowed_domains = ["http://www.gelbeseiten.de"]
start_urls = ['http://www.gelbeseiten.de/zoohandlungen/s1/alphabetisch']
def parse(self, response):
for adress in response.css('article'):
#Strasse
strasse = adress.xpath('//span[@itemprop="streetAddress"]//text()').extract_first()
#Name
name = adress.xpath('//span[@itemprop="name"]//text()').extract_first()
#PLZ
plz = adress.xpath('//span[@itemprop="postalCode"]//text()').extract_first()
#Stadt
stadt = adress.xpath('//span[@itemprop="addressLocality"]//text()').extract_first()
yield {
'name': name,
'strasse': strasse,
'plz': plz,
'stadt': stadt,
}
结果我得到了 15 个地址总是相同的集合,但我认为它应该是 15 个不同的地址.
As the result i get 15 sets with always the same address but i think it should be 15 different addresses.
感谢您的帮助.
推荐答案
您使用绝对 xpath 表达式:
You use absolute xpath expressions:
adress.xpath('//span[@itemprop="streetAddress"]//text()')
while 应该使用相对于 address
(注意表达式中的前导点):
while should use relative to address
(note leading dot in expression):
adress.xpath('.//span[@itemprop="streetAddress"]//text()')
这篇关于Scrapy 只返回第一个结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!