问题描述
我使用TIdHttp来获取网页内容。响应头表示内容编码为utf8。我想在CP936(简体中文)中在控制台中打印内容,但实际内容不可读。
I use TIdHttp to fetch web content. The response header indicates the content encoding to be utf8. I want to print content in console as CP936 (simplified chinese), but the actual content is not readable.
Result := TEncoding.Utf8.GetString(ResponseBuffer);
我在python(使用httplib2)做同样的事情没有任何问题。
I do the same thing in python (using httplib2) without any problems.
def python_try():
conn = httplib2.HttpConn()
respose, content = conn.get(...)
print content.decode('utf8') # readable in console
更新1
我调试了原始响应,并注意到内容被gzip压缩。
I debugged the raw response and noticed that the content is gzipped.
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Content-Encoding: gzip
Vary: Accept-Encoding
Date: Mon, 24 Dec 2012 15:27:44 GMT
Connection: Keep-Alive
我试图为IdHttp实例分配一个IdCompressorZLib实例。不幸的是,在解压gzip的内容时,应用程序将崩溃。测试地址是http:\www.baidu.com(encoding = gb2312)。
I tried to assign a IdCompressorZLib instance to IdHttp instance. Unfortunately, the application will crash while decompressing gzipped content. The test address is "http\://www.baidu.com" (encoding=gb2312).
更新2
我也试图下载一个gzip压缩的jquery脚本文件,它只包含ascii字符。这一次它的作品,这意味着成为Indy图书馆的一个问题。如果我没有错,我应该关闭这个问题。
I also tried to download a gzipped jquery script file, which contains only ascii chars. This time it works, which means to be a problem of Indy library. If I were not wrong, I should close the question.
推荐答案
TIdHTTP
为您处理gzip解压缩,如果您有一个 TIdCompressorZLib
组件分配给 TIdHTTP.Compressor
属性。否则,您将不得不手动解压缩( TIdHTTP
将默认不发送 Accept-Encoding
标头,如果 Compressor
属性未分配)
TIdHTTP
handles the gzip decompression for you, if you have a TIdCompressorZLib
component assigned to the TIdHTTP.Compressor
property. Otherwise, you will have to decompress it manually (TIdHTTP
will not send an Accept-Encoding
header by default if the Compressor
property is not assigned).
对于UTF-8编码, TIdHTTP
还为您处理,如果您调用重载版本的 TIdHTTP.Get()
或 TIdHTTP。 Post()
方法返回一个 String
值而不是填充一个 TStream
对象。它将为您解码UTF-8到UTF-16。要转换为CP936,您可以让RTL为您执行转换:
As for the UTF-8 encoding, TIdHTTP
also handles that for you as well, if you are calling the overloaded version of the TIdHTTP.Get()
or TIdHTTP.Post()
method that returns a String
value instead of fill a TStream
object. It will decode the UTF-8 to UTF-16 for you. To convert that to CP936, you can let the RTL do the conversion for you:
type
Cp936String = type AnsiString(936);
var
S: Cp936String;
begin
S := Cp936String(IdHTTP1.Get(...));
这篇关于使用IdHttp解码响应内容失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!