本文介绍了Python Scrapy在请求后将内容类型覆盖为多部分/表单数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于某种原因,尝试使用scrapy抓取将其帖子请求编码为"multipart/form-data"的网站.

Trying to use scrapy to scrape a website which encodes its post requests as "multipart/form-data" for some reason.

是否有一种方法可以覆盖"scrapy"使用"application/x-www-form-urlencoded"进行发布的默认行为?

Is there a way to override scrapy's default behavior of posting using "application/x-www-form-urlencoded"?

该站点似乎没有响应蜘蛛,因为它希望使用"multipart/form-data"发布其请求.

It looks like the site is not responding to the spider because it wants its requests posted using "multipart/form-data".

已经尝试过对表单变量进行多部分编码,但是无论使用哪种编码方式,使用wireshark都无法正确设置标头.

Have tried multipart encoding the form variables but have seen using wireshark that scrapy still sets the header incorrectly irrespective of this encoding.

推荐答案

只需使用 scrapy.http.FormRequest 而不是scrapy.Request,而是在formdata参数中传递参数.

Just use scrapy.http.FormRequest instead of scrapy.Request, passing the parameters in the formdata argument.

示例代码:

import scrapy
from scrapy.http import FormRequest

class MySpider(scrapy.Spider):
    # ...
    def start_requests(self):
        yield FormRequest(some_post_url,
                          formdata=dict(param1='value1', param2='value2'))

了解更多:

  • 请求使用示例
  • FormRequest对象
  • Read more:

    • Request usage examples
    • FormRequest objects
    • 这篇关于Python Scrapy在请求后将内容类型覆盖为多部分/表单数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 05:20