问题描述
我有一个可以正常工作的爬虫爬虫.我现在想使用它的解析"函数来解析给定的 url.虽然存在一个命令行实用程序可以使用命令为单个 url 执行此操作:
I have a scrapy crawler which works fine.I now want to use its 'parse' function to parse a given url.While there exists a command line utility to do so for a single url using command:
scrapy parse
但我想在我的 python 代码中执行此操作(并且不为每个 url 启动一个新进程不是一种选择)
But I want to do this inside my python code (and no starting a new process for every url is not an option)
据我所知,我需要这样做本质上是一种在给定 url 的情况下创建响应的方法.由于scrapy 接受的响应与HTTPResponse 不同,我不确定如何获得给定url 的响应.
From what I figure what I need for this is essentially a way to create Response given a url.Since the response that scrapy takes is not the same as HTTPResponse, I am not sure how to get that response given a url.
我确实找到了一个很明显的方法 make_reqests_from_url,但我不确定如何从scrapy Request 到scrapy response,我可以将其传递给解析函数.
I did find a method make_reqests_from_url which does the obvious, but I am not sure how to get from scrapy Request to scrapy response, which I can pass to the parse function.
推荐答案
快速组合(来自这里和此处) 以防万一,与 OP 不同,subprocess
是一个选项.
A quick kludge (with pieces from here and here) in case, unlike for OP, subprocess
is an option.
import subprocess
bashCommand = "Scrapy fetch http://www.testsite.com/testpage.html"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
page, scrapy_meta_info = process.communicate()
这篇关于使用scrapy解析函数解析特定的url的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!