为什么我没有收到短信?我在许多网站上都使用过此脚本,但从未遇到过此问题。

import scrapy.selector
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from Prijsvergelijking_Final.items import PrijsvergelijkingFinalItem

vendors = []
for line in open("vendors.txt", "r"):
    vendors.append(line.strip("\n\-"))
e = {}
for vendor in vendors:
    e[vendor] = True

class ArtcrafttvSpider(CrawlSpider):
    name = "ARTCRAFTTV"
    allowed_domains = ["artencraft.be"]
    start_urls = ["https://www.artencraft.be/nl/beeld-en-geluid/televisie"]
    rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('//li[@class="next"]',)), callback = "parse_start_url",follow = True),)
    def parse_start_url(self, response):
        products = response.xpath("//ul[@class='product-overview list']/li")
        for product in products:
            item = PrijsvergelijkingFinalItem()
            item["Product_a"] = product.xpath(".//a/span/h3/text()").extract_first().strip().replace("-","")
            item["Product_price"] = product.xpath(".//a/h4/text()").extract_first()
            for word in item['Product_a'].split(" "):
                if word in e:
                    item['item_vendor'] = word
            yield item


网站代码:

python - 文字不可见Python-LMLPHP

运行脚本后的结果:

python - 文字不可见Python-LMLPHP

有什么建议可以解决这个问题吗?

最佳答案

简短的答案是:

价格字段值的xpath错误

详细:

不要总是假设页面结构与屏幕上显示的页面结构相同。并非总是所见即所得

由于某种原因,我看到inspect element(firefox)将价格值显示为//a/h4标记的子级,但是如果您分析下载的页面源,则会看到页面上存在该价格值,但它不是子级是//a/h4标记的子元素,但它是//a标记的子元素,因此//a/text()会为您提供所需的值

关于python - 文字不可见Python,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37214696/

10-12 17:10