本文介绍了Scrapy0.22:连接时发生错误:<class 'twisted.internet.error.ConnectionLost'>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

早上好,

在执行我的一个蜘蛛程序时出现连接错误:

I get a connection error while executing one of my spiders:

2014-02-28 10:21:00+0400 [butik] DEBUG: Retrying <GET http://www.butik.ru/> (failed 1 times): An error occurred while connecting: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion: Connection lost.].

之后蜘蛛关闭.

所有其他具有相似结构的蜘蛛都运行顺利,但这个蜘蛛:

All other spiders with a smiliar structure are running smoothly, but this one:

class butik(Spider):
    name = "butik"
    allowed_domains = ['butik.ru']
    start_urls      = ['http://www.butik.ru/']

    def parse(self, response):
        sel = Selector(response)
        print response.url
        maincats = sel.xpath('//div[@id="main_menu"]//a/@href').extract()
        for maincat in maincats:
            maincat = 'http://www.butik.ru'+ maincat
            request = Request(maincat, callback=self.categories)
            yield request

我不知道要采取哪些步骤来解决这个问题,很高兴得到任何提示和答案.如果需要其他信息,我很乐意提供必要的代码.

I'm quite clueless which steps to take in order to fix this issue and am glad for any hints and answers. If additional informations are needed I would be happy to provide the neccessary code.

提前致谢

J

推荐答案

您可以尝试使用 urllib2 代替.当我使用 scrapy 抓取页面时,我也遇到了类似的问题,但我通过在 parse 中使用 urllib2 解决了这个问题:

You can try the urllib2 instead. I also got similar problem when I'm using scrapy to crawl a page, but I fix this problem by using urllib2 inside a parse:

import urllib2

def parse(self,response):
    # ...
    url = 'www.example.com'
    req = urllib2.Request(url,data)
    response = urllib2.urlopen(req)
    the_page = response.read()
    # ...

这篇关于Scrapy0.22:连接时发生错误:&lt;class 'twisted.internet.error.ConnectionLost'&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 16:00