使用scrapy解析函数解析特定的url

本文介绍了使用scrapy解析函数解析特定的url的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个可以正常工作的爬虫爬虫.我现在想使用它的解析"函数来解析给定的 url.虽然存在一个命令行实用程序可以使用命令为单个 url 执行此操作:

I have a scrapy crawler which works fine.I now want to use its 'parse' function to parse a given url.While there exists a command line utility to do so for a single url using command:

scrapy parse

但我想在我的 python 代码中执行此操作(并且不为每个 url 启动一个新进程不是一种选择)

But I want to do this inside my python code (and no starting a new process for every url is not an option)

据我所知，我需要这样做本质上是一种在给定 url 的情况下创建响应的方法.由于scrapy 接受的响应与HTTPResponse 不同，我不确定如何获得给定url 的响应.

From what I figure what I need for this is essentially a way to create Response given a url.Since the response that scrapy takes is not the same as HTTPResponse, I am not sure how to get that response given a url.

我确实找到了一个很明显的方法 make_reqests_from_url，但我不确定如何从scrapy Request 到scrapy response，我可以将其传递给解析函数.

I did find a method make_reqests_from_url which does the obvious, but I am not sure how to get from scrapy Request to scrapy response, which I can pass to the parse function.

推荐答案

快速组合(来自这里和此处) 以防万一，与 OP 不同，subprocess 是一个选项.

A quick kludge (with pieces from here and here) in case, unlike for OP, subprocess is an option.

import subprocess
bashCommand = "Scrapy fetch http://www.testsite.com/testpage.html"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
page, scrapy_meta_info = process.communicate()

这篇关于使用scrapy解析函数解析特定的url的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！