问题描述
使用 python-requests 和 python-magic,我想测试一个网络资源的 mime 类型而不获取它的所有内容(特别是如果这个资源恰好是一个 ogg 文件或一个 PDF 文件).根据结果,我可能决定全部获取.但是,在测试了 mime 类型后调用 text 方法只会返回尚未消耗的内容.如何在不消耗响应内容的情况下测试 mime 类型?
Using python-requests and python-magic, I would like to test the mime-type of a web resource without fetching all its content (especially if this resource happens to be eg. an ogg file or a PDF file). Based on the result, I might decide to fetch it all. However calling the text method after having tested the mime-type only returns what hasn't been consumed yet. How could I test the mime-type without consuming the response content?
以下是我当前的代码.
import requests
import magic
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)
if mime == "text/html":
print(r.text) # I'd like r.text to give me the entire response content
谢谢!
推荐答案
注意:在问这个问题的时候,正确的方法是使用 预取=假
.该选项已被重命名为 stream
并且布尔值被反转,因此您需要 stream=True
.
Note: at the time this question was asked, the correct method to fetch only headers stream the body was to use prefetch=False
. That option has since been renamed to stream
and the boolean value is inverted, so you want stream=True
.
原始答案如下.
一旦使用了iter_content()
,就必须继续使用;.text
间接使用相同的接口(通过 .content
).
Once you use iter_content()
, you have to continue using it; .text
indirectly uses the same interface under the hood (via .content
).
换句话说,通过使用 iter_content()
,您必须手动完成 .text
所做的工作:
In other words, by using iter_content()
at all, you have to do the work .text
does by hand:
from requests.compat import chardet
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)
if mime == "text/html":
contents = peek + b''.join(r.iter_content(10 * 1024))
encoding = r.encoding
if encoding is None:
# detect encoding
encoding = chardet.detect(contents)['encoding']
try:
textcontent = str(contents, encoding, errors='replace')
except (LookupError, TypeError):
textcontent = str(contents, errors='replace')
print(textcontent)
假设您使用 Python 3.
presuming you use Python 3.
另一种方法是发出 2 个请求:
The alternative is to make 2 requests:
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)
if mime == "text/html":
print(r.requests.get("http://www.december.com/html/demo/hello.html").text)
Python 2 版本:
Python 2 version:
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)
if mime == "text/html":
contents = peek + ''.join(r.iter_content(10 * 1024))
encoding = r.encoding
if encoding is None:
# detect encoding
encoding = chardet.detect(contents)['encoding']
try:
textcontent = unicode(contents, encoding, errors='replace')
except (LookupError, TypeError):
textcontent = unicode(contents, errors='replace')
print(textcontent)
这篇关于python-requests:获取响应内容的头部而不消耗它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!