为什么我没有收到短信?我在许多网站上都使用过此脚本,但从未遇到过此问题。
import scrapy.selector
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from Prijsvergelijking_Final.items import PrijsvergelijkingFinalItem
vendors = []
for line in open("vendors.txt", "r"):
vendors.append(line.strip("\n\-"))
e = {}
for vendor in vendors:
e[vendor] = True
class ArtcrafttvSpider(CrawlSpider):
name = "ARTCRAFTTV"
allowed_domains = ["artencraft.be"]
start_urls = ["https://www.artencraft.be/nl/beeld-en-geluid/televisie"]
rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('//li[@class="next"]',)), callback = "parse_start_url",follow = True),)
def parse_start_url(self, response):
products = response.xpath("//ul[@class='product-overview list']/li")
for product in products:
item = PrijsvergelijkingFinalItem()
item["Product_a"] = product.xpath(".//a/span/h3/text()").extract_first().strip().replace("-","")
item["Product_price"] = product.xpath(".//a/h4/text()").extract_first()
for word in item['Product_a'].split(" "):
if word in e:
item['item_vendor'] = word
yield item
网站代码:
运行脚本后的结果:
有什么建议可以解决这个问题吗?
最佳答案
简短的答案是:
价格字段值的xpath错误
详细:
不要总是假设页面结构与屏幕上显示的页面结构相同。并非总是所见即所得
由于某种原因,我看到inspect element
(firefox)将价格值显示为//a/h4
标记的子级,但是如果您分析下载的页面源,则会看到页面上存在该价格值,但它不是子级是//a/h4
标记的子元素,但它是//a
标记的子元素,因此//a/text()
会为您提供所需的值
关于python - 文字不可见Python,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37214696/