


I've been trying all day to make this thing works but it's still not right yet. I've checked so many posts around here and tested so many different implementations that I'dont know where to look now...


Here is my situation, I have a small php test file (gz.php) on my server wich looks like this :

header("Content-Encoding: gzip");
$contents = gzcompress("Is it working?", 9);


This is the simplest I could do and it works fine with any web browser.


Now I have an Android activity using Jsoup that has this code :

URL url = new URL("http://myServerAdress.com/gz.php");
doc = Jsoup.parse(url, 1000);


Which cause an empty EOFException on the "Jsoup.parse" line.


I've read everywhere that Jsoup is supposed to parse gzipped content without having to do anything special, but obviously, there's something missing.

我试过很多其他方法如使用Jsoup.connect()。获得()或InpuStream,GZipInputStream和DataInpuStream。我也尝试了gzDeflate()和gzen code()从PHP方法很好,但没有运气无论是。我甚至尝试不申报头编码的PHP和稍后尝试放气的内容...但它是聪明有效...

I've tried many other ways like using Jsoup.connect().get() or InpuStream, GZipInputStream and DataInpuStream. I did try the gzDeflate() and gzencode() methods from PHP as well but no luck either. I even tried not to declare the header-encoding in PHP and try to deflate the content later...but it was as clever as effective...


It has to be something "stupid" I'm missing but I just can't tell what... anybody has an idea?

(PS:我使用Jsoup 1.7.0,因此最新一期截至目前)

(ps : I'm using Jsoup 1.7.0, so the latest one as of now)


在注释中指出的提问者说gzcom preSS正在写一CRC,这是不正确的和不完整的,据来自的,手术code是:

The asker indicated in a comment that gzcompress was writing a CRC that was both incorrect and incomplete, according to information from here, the operative code being:

// Display the header of the gzip file
// Thanks ck@medienkombinat.de!
// Only display this once
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";

// Figure out the size and CRC of the original for later
$Size = strlen($contents);
$Crc = crc32($contents);

// Compress the data
$contents = gzcompress($contents, 9);

// We can't just output it here, since the CRC is messed up.
// If I try to "echo $contents" at this point, the compressed
// data is sent, but not completely.  There are four bytes at
// the end that are a CRC.  Three are sent.  The last one is
// left in limbo.  Also, if we "echo $contents", then the next
// byte we echo will not be sent to the client.  I am not sure
// if this is a bug in 4.0.2 or not, but the best way to avoid
// this is to put the correct CRC at the end of the compressed
// data.  (The one generated by gzcompress looks WAY wrong.)
// This will stop Opera from crashing, gunzip will work, and
// other browsers won't keep loading indefinately.
// Strip off the old CRC (it's there, but it won't be displayed
// all the way -- very odd)
$contents = substr($contents, 0, strlen($contents) - 4);

// Show only the compressed data
echo $contents;

// Output the CRC, then the size of the original

的评论说,jsoup只是使用一个普通的Java GZIPInputStream 来解析gzip的,所以你会打的问题与任何Java程序。该EOFException类是presumably由于不完整的CRC。

Jonathan Hedley commented, "jsoup just uses a normal Java GZIPInputStream to parse the gzip, so you'd hit that issue with any Java program." The EOFException is presumably due to the incomplete CRC.


09-05 12:11