本文介绍了在scrapyd中启用HttpProxyMiddleware的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读scrapy文档后,我认为HttpProxyMiddleware是默认启用的.但是当我通过scrapyd的webservice接口启动spider时,HttpProxyMiddleware没有启用.我收到以下输出:

After reading the scrapy documentation, I thought that the HttpProxyMiddleware is enabled by default. But when I start a spider via scrapyd's webservice interface, HttpProxyMiddleware is not enabled. I receive the following output:

2013-02-18 23:51:01+1300 [scrapy] INFO: Scrapy 0.17.0-120-gf293d08 started (bot: pde)
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled extensions: FeedExporter, LogStats, CloseSpider, WebService, CoreStats, SpiderState
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled item pipelines: PdePipeline
2013-02-18 23:51:02+1300 [shotgunsupplements] INFO: Spider opened

请注意,未启用 HttpProxyMiddleware.如何为scrapyd启用它?任何帮助将不胜感激.

Note that HttpProxyMiddleware is not enabled. How can I enable it for scrapyd? Any help will be greatly appreciated.

我的scrapy.cfg

My scrapy.cfg

# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/topics/scrapyd.html

[settings]
default = pd.settings

[deploy]
url = http://localhost:6800/
project = pd

我有以下 settings.py

I have the following settings.py

BOT_NAME = 'pd' #this gets replaced with a function
BOT_VERSION = '1.0'

SPIDER_MODULES = ['pd.spiders']
NEWSPIDER_MODULE = 'pd.spiders'
DEFAULT_ITEM_CLASS = 'pd.items.Product'
ITEM_PIPELINES = 'pd.pipelines.PdPipeline'
USER_AGENT = '%s/%s' % (BOT_NAME, BOT_VERSION)

TELNETCONSOLE_HOST = '127.0.0.1' # defaults to 0.0.0.0 set so
TELNETCONSOLE_PORT = '6073'      # only we can see it.
TELNETCONSOLE_ENABLED = False

WEBSERVICE_ENABLED = True

LOG_ENABLED = True


ROBOTSTXT_OBEY = False
ITEM_PIPELINES = [
    'pd.pipelines.PdPipeline',
    ]

DATA_DIR = '/home/pd/scraped_data' #directory to store export files to.

DOWNLOAD_DELAY = 2.0

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 750,
}

问候,

班舒

推荐答案

在花费了很多时间尝试调试之后,事实证明 HttpProxyMiddleware 实际上期望设置 http_proxy 环境变量.如果未设置 http_proxy,则不会加载中间件.因此,我设置了 http_proxy 和 bob 是你的叔叔!一切正常!

After spending forever trying to debug, it turns out that HttpProxyMiddleware actually expects http_proxy environment variable to be set. The middleware will not be loaded if http_proxy is not set. Therefore, I set http_proxy and bob's your uncle! Everything works!

这篇关于在scrapyd中启用HttpProxyMiddleware的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 11:54