本文介绍了如何使用Content-Encoding读取压缩的HTML页面:gzip的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我请求一个发送 Content-Encoding:gzip 标头的网页,但却被卡住了怎么读... ..

I request a web page that sends a Content-Encoding: gzip header, but got stuck how to read it..

My代码:

    try {
        URLConnection connection = new URL("http://jquery.org").openConnection();
        String html = "";
        BufferedReader in = null;
        connection.setReadTimeout(10000);
    in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
    String inputLine;
    while ((inputLine = in.readLine()) != null){
    html+=inputLine+"\n";
        }
    in.close();
        System.out.println(html);
        System.exit(0);
    } catch (IOException ex) {
        Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
    }

输出看起来很乱......(我无法将其粘贴到此处,一种符号..)

The output looks very messy.. (I was unable to paste it here, a sort of symbols..)

我相信这是一个压缩内容,如何解析它?

I believe this is a compressed content, how to parse it?

注意:

如果我将jquery.org更改为jquery.com(不发送该标题,我的代码运行良好)

Note:
If I change jquery.org to jquery.com (which don't send that header, my code works well)

推荐答案

有一个类:。它是一个 InputStream ,因此使用起来非常透明。

There is a class for this: GZIPInputStream. It is an InputStream and so is very transparent to use.

这篇关于如何使用Content-Encoding读取压缩的HTML页面:gzip的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-02 13:39