解析请求响应时我应该使用 .text 还是 .content ?

本文介绍了解析请求响应时我应该使用 .text 还是 .content ?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我偶尔会使用 res.content 或 res.text 来解析来自请求.在我拥有的用例中，我使用哪个选项似乎并不重要.

使用 .content 或 .text 解析 HTML 的主要区别是什么?例如:

导入请求从 lxml 导入 htmlres = requests.get(...)节点 = html.fromstring(res.content)

在上述情况下，我应该使用res.content 还是res.text?什么时候使用每个的好的经验法则是什么?

解决方案

来自文档:

当您提出请求时，Requests 会对请求做出有根据的猜测基于 HTTP 标头的响应编码.文本编码当您访问 r.text 时，使用 Requests 猜测.你可以了解一下请求正在使用什么编码，并使用 r.encoding 更改它属性:

>>>r.编码'utf-8'>>>r.encoding = 'ISO-8859-1'

如果更改编码，请求将使用新的值r.encoding 每当您调用 r.text 时.你可能想在任何在这种情况下，您可以应用特殊逻辑来计算出内容的编码将是.例如，HTTP 和 XML 具有能够在他们的身体中指定他们的编码.在这样的情况下这个，你应该使用 r.content 找到编码，然后设置r.encoding.这将让您使用具有正确编码的 r.text.

所以 r.content 用于当服务器返回二进制数据或虚假编码标头时，尝试在元标记中找到正确的编码.

I occasionally use res.content or res.text to parse a response from Requests. In the use cases I have had, it didn't seem to matter which option I used.

What is the main difference in parsing HTML with .content or .text? For example:

import requests
from lxml import html
res = requests.get(...)
node = html.fromstring(res.content)

In the above situation, should I be using res.content or res.text? What is a good rule of thumb for when to use each?

解决方案

From the documentation:

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

So r.content is used when the server returns binary data, or bogus encoding headers, to try to find the correct encoding inside a meta tag.

这篇关于解析请求响应时我应该使用 .text 还是 .content ?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！