使用IdHttp解码响应内容失败

使用IdHttp解码响应内容失败

本文介绍了使用IdHttp解码响应内容失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用TIdHttp来获取网页内容。响应头表示内容编码为utf8。我想在CP936(简体中文)中在控制台中打印内容,但实际内容不可读。

I use TIdHttp to fetch web content. The response header indicates the content encoding to be utf8. I want to print content in console as CP936 (simplified chinese), but the actual content is not readable.

Result := TEncoding.Utf8.GetString(ResponseBuffer);

我在python(使用httplib2)做同样的事情没有任何问题。

I do the same thing in python (using httplib2) without any problems.

def python_try():
    conn = httplib2.HttpConn()
    respose, content = conn.get(...)
    print content.decode('utf8') # readable in console






更新1

我调试了原始响应,并注意到内容被gzip压缩。

I debugged the raw response and noticed that the content is gzipped.

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Content-Encoding: gzip
Vary: Accept-Encoding
Date: Mon, 24 Dec 2012 15:27:44 GMT
Connection: Keep-Alive

我试图为IdHttp实例分配一个IdCompressorZLib实例。不幸的是,在解压gzip的内容时,应用程序将崩溃。测试地址是http:\www.baidu.com(encoding = gb2312)。

I tried to assign a IdCompressorZLib instance to IdHttp instance. Unfortunately, the application will crash while decompressing gzipped content. The test address is "http\://www.baidu.com" (encoding=gb2312).

更新2

我也试图下载一个gzip压缩的jquery脚本文件,它只包含ascii字符。这一次它的作品,这意味着成为Indy图书馆的一个问题。如果我没有错,我应该关闭这个问题。

I also tried to download a gzipped jquery script file, which contains only ascii chars. This time it works, which means to be a problem of Indy library. If I were not wrong, I should close the question.

推荐答案

TIdHTTP 为您处理gzip解压缩,如果您有一个 TIdCompressorZLib 组件分配给 TIdHTTP.Compressor 属性。否则,您将不得不手动解压缩( TIdHTTP 将默认不发送 Accept-Encoding 标头,如果 Compressor 属性未分配)

TIdHTTP handles the gzip decompression for you, if you have a TIdCompressorZLib component assigned to the TIdHTTP.Compressor property. Otherwise, you will have to decompress it manually (TIdHTTP will not send an Accept-Encoding header by default if the Compressor property is not assigned).

对于UTF-8编码, TIdHTTP 还为您处理,如果您调用重载版本的 TIdHTTP.Get() TIdHTTP。 Post()方法返回一个 String 值而不是填充一个 TStream 对象。它将为您解码UTF-8到UTF-16。要转换为CP936,您可以让RTL为您执行转换:

As for the UTF-8 encoding, TIdHTTP also handles that for you as well, if you are calling the overloaded version of the TIdHTTP.Get() or TIdHTTP.Post() method that returns a String value instead of fill a TStream object. It will decode the UTF-8 to UTF-16 for you. To convert that to CP936, you can let the RTL do the conversion for you:

type
  Cp936String = type AnsiString(936);
var
  S: Cp936String;
begin
  S := Cp936String(IdHTTP1.Get(...));

这篇关于使用IdHttp解码响应内容失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 07:23