本文介绍了Jsoup和gzip压缩的HTML内容(Android版)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在试图整天做这件事情的作品,但它仍然是不对的呢。我已经在这里检查那么多帖子和测试这么多不同的实现,I'dont知道现在在哪里看......

I've been trying all day to make this thing works but it's still not right yet. I've checked so many posts around here and tested so many different implementations that I'dont know where to look now...

下面是我的情况,我有一个小的PHP测试文件(gz.php)在我的服务器至极看起来像这样:

Here is my situation, I have a small php test file (gz.php) on my server wich looks like this :

header("Content-Encoding: gzip");
print("\x1f\x8b\x08\x00\x00\x00\x00\x00");
$contents = gzcompress("Is it working?", 9);
print($contents);

这是我可以做的最简单,它工作正常使用任何Web浏览器。

This is the simplest I could do and it works fine with any web browser.

现在我有使用Jsoup具有此code一个Android活动:

Now I have an Android activity using Jsoup that has this code :

URL url = new URL("http://myServerAdress.com/gz.php");
doc = Jsoup.parse(url, 1000);

造成的Jsoup.parse行空EOFException类。

Which cause an empty EOFException on the "Jsoup.parse" line.

我读过到处是Jsoup应该解析gzip压缩的内容,而无需做什么特别,但很明显,有一些缺失。

I've read everywhere that Jsoup is supposed to parse gzipped content without having to do anything special, but obviously, there's something missing.

我试过很多其他方法如使用Jsoup.connect()。获得()或InpuStream,GZipInputStream和DataInpuStream。我也尝试了gzDeflate()和gzen code()从PHP方法很好,但没有运气无论是。我甚至尝试不申报头编码的PHP和稍后尝试放气的内容...但它是聪明有效...

I've tried many other ways like using Jsoup.connect().get() or InpuStream, GZipInputStream and DataInpuStream. I did try the gzDeflate() and gzencode() methods from PHP as well but no luck either. I even tried not to declare the header-encoding in PHP and try to deflate the content later...but it was as clever as effective...

它必须是一些愚蠢我错过了,但我就是不能告诉任何人......有一个想法?

It has to be something "stupid" I'm missing but I just can't tell what... anybody has an idea?

(PS:我使用Jsoup 1.7.0,因此最新一期截至目前)

(ps : I'm using Jsoup 1.7.0, so the latest one as of now)

推荐答案

在注释中指出的提问者说gzcom preSS正在写一CRC,这是不正确的和不完整的,据来自的,手术code是:

The asker indicated in a comment that gzcompress was writing a CRC that was both incorrect and incomplete, according to information from here, the operative code being:

// Display the header of the gzip file
// Thanks ck@medienkombinat.de!
// Only display this once
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";

// Figure out the size and CRC of the original for later
$Size = strlen($contents);
$Crc = crc32($contents);

// Compress the data
$contents = gzcompress($contents, 9);

// We can't just output it here, since the CRC is messed up.
// If I try to "echo $contents" at this point, the compressed
// data is sent, but not completely.  There are four bytes at
// the end that are a CRC.  Three are sent.  The last one is
// left in limbo.  Also, if we "echo $contents", then the next
// byte we echo will not be sent to the client.  I am not sure
// if this is a bug in 4.0.2 or not, but the best way to avoid
// this is to put the correct CRC at the end of the compressed
// data.  (The one generated by gzcompress looks WAY wrong.)
// This will stop Opera from crashing, gunzip will work, and
// other browsers won't keep loading indefinately.
//
// Strip off the old CRC (it's there, but it won't be displayed
// all the way -- very odd)
$contents = substr($contents, 0, strlen($contents) - 4);

// Show only the compressed data
echo $contents;

// Output the CRC, then the size of the original
gzip_PrintFourChars($Crc);
gzip_PrintFourChars($Size);

的评论说,jsoup只是使用一个普通的Java GZIPInputStream 来解析gzip的,所以你会打的问题与任何Java程序。该EOFException类是presumably由于不完整的CRC。

Jonathan Hedley commented, "jsoup just uses a normal Java GZIPInputStream to parse the gzip, so you'd hit that issue with any Java program." The EOFException is presumably due to the incomplete CRC.

这篇关于Jsoup和gzip压缩的HTML内容(Android版)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:11
查看更多