问题描述
我正在尝试解析使用 Python 完成的 HEAD 请求的结果 请求 库,但似乎无法访问响应内容.
I'm trying to parse the result of a HEAD request done using the Python Requests library, but can't seem to access the response content.
根据 docs,我应该能够访问内容来自 requests.Response.text.这对我的 GET 请求很有效,但在 HEAD 请求中返回 None.
According to the docs, I should be able to access the content from requests.Response.text. This works fine for me on GET requests, but returns None on HEAD requests.
GET 请求(有效)
import requests
response = requests.get(url)
content = response.text
content =
content = <html>...</html>
HEAD 请求(无内容)
import requests
response = requests.head(url)
content = response.text
content = None
编辑
好的,我很快从答案中意识到 HEAD 请求不应该返回仅包含内容的标头.但这是否意味着,要访问在页面的 标记中找到的内容,例如
和
代码> 标签,那个必须获取整个文档?
OK I've quickly realized form the answers that the HEAD request is not supposed to return content- only headers. But does that mean that, to access things found IN the <head>
tag of a page, like <link>
and <meta>
tags, that one must GET the whole document?
推荐答案
由定义,对 HEAD 请求的响应不包含消息正文.
By definition, the responses to HEAD requests do not contain a message-body.
如果您想获得响应正文,请发送 GET 请求.发送 HEAD 请求iff您只对响应状态代码和标题感兴趣.
Send a GET request if you want to, well, get a response body. Send a HEAD request iff you are only interested in the response status code and headers.
HTTP 传输任意内容;HTTP 术语 header 与 HTML 完全无关.但是,可以建议 HTTP 仅下载文档的一部分.如果您知道 HTML
<head>
代码的长度(或其上限),您可以包含一个 HTTP Range 标头建议远程服务器仅返回一定数量的字节.如果远程服务器支持 HTTP 范围,它将提供简化的答案.
HTTP transfers arbitrary content; the HTTP term header is completely unrelated to an HTML <head>
. However, HTTP can be advised to download only a part of the document. If you know the length of the HTML <head>
code (or an upper boundary therefor), you can include an HTTP Range header in your request that advises the remote server to only return a certain number of bytes. If the remote server supports HTTP ranges, it will then serve the reduced answer.
这篇关于使用 Python 请求获取 HEAD 内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!