本文介绍了Scrapy:抓取CSV文件-未获得任何输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在关注以下示例: CSVFeedSpider 抓取CSV数据,会生成'item.xml',但xml文件中没有任何内容.

I am following this example: CSVFeedSpider to scrape CSV data, 'item.xml' is generated but there's nothing in the xml file.

任何人都可以帮忙吗?谢谢!

Can anyone help? Thanks!

csvspider.py

class MySpider(CSVFeedSpider):
    name = 'csvexample'
    start_urls = ['file:///D:/desktop/example.csv']
    delimiter = ','
    headers = ['Address', 'Website']

    def parse_row(self, response, row):
        log.msg('Hi, this is a row!: %r' % row)
        item = csvItems()
        item['address'] = row['Address']
        item['website'] = row['Website']
        return item

items.py

class csvItems(Item):
    address = Field()
    website = Field()

example.csv

Item,Address,Website
1,"this, address","www.google.com"

用于运行的命令

scrapy crawl csvexample -o item.xml -t xml

推荐答案

如果单独运行蜘蛛,没有输出参数,则可能会看到类似于以下内容的错误:

If you run the spider by itself, without the output parameters, then you would probably see errors similar to the following:

2014-05-12 08:08:41+0100 [scrapy] WARNING: ignoring row 1 (length: 3, should be: 2)
2014-05-12 08:08:41+0100 [scrapy] WARNING: ignoring row 2 (length: 3, should be: 2)

要解决此问题,请按如下所示修改蜘蛛程序代码中的标头行:

To fix the issue, modify your headers line in your spider code as follows:

headers = ['Item', 'Address', 'Website']

这篇关于Scrapy:抓取CSV文件-未获得任何输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 14:02