java - 解压缩到ByteArrayOutputStream中-为什么会出现EOFException？

我一直在尝试创建一个Java程序，该程序将从在线API中读取zip文件，将它们解压缩到内存中（而不是文件系统中），然后将它们加载到数据库中。由于解压缩后的文件需要以特定顺序加载到数据库中，因此在加载任何文件之前，我将必须先解压缩所有文件。

我基本上在StackOverflow上使用another question作为如何执行此操作的模型。使用ZipInputStream中的util.zip，我可以使用较小的ZIP（0.7MB压缩〜4MB解压缩）来执行此操作，但是当我遇到较大的文件（25MB压缩，135MB解压缩）时，两个最大的文件未读入记忆。对于这些较大的文件（8MB和120MB，后者构成zip文件中的绝大多数数据），我什至无法检索ZipEntry。没有引发任何异常，并且我的程序继续进行，直到尝试访问未能写入的解压缩文件并抛出NullPointerException为止。

我正在使用Jsoup从在线获取zipfile。

有没有人对此有任何经验，可以为我为什么无法检索zip文件的完整内容提供指导？

下面是我正在使用的代码。我正在HashMap中以InputStream的形式收集解压缩的文件，并且当不再有ZipEntry时，当没有更多的ZipEntry时，程序应停止寻找。

    private Map<String, InputStream> unzip(ZipInputStream verZip) throws IOException {

        Map<String, InputStream> result = new HashMap<>();

        while (true) {
            ZipEntry entry;
            byte[] b = new byte[1024];
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            int l;

            entry = verZip.getNextEntry();//Might throw IOException

            if (entry == null) {
                break;
            }

            try {
                while ((l = verZip.read(b)) > 0) {
                    out.write(b, 0, l);
                }
                out.flush();
            }catch(EOFException e){
                e.printStackTrace();
            }
            catch (IOException i) {
                System.out.println("there was an ioexception");
                i.printStackTrace();
                fail();
            }
            result.put(entry.getName(), new ByteArrayInputStream(out.toByteArray()));
        }
        return result;
    }

如果我的程序利用文件系统来解压缩文件，我可能会更好吗？

最佳答案

事实证明，Jsoup是问题的根源。通过Jsoup连接获取二进制数据时，从连接中读取多少字节是有限制的。默认情况下，此限制为1048576或1兆字节。结果，当我将Jsoup中的二进制数据输入到ZipInputStream时，生成的数据将在1兆字节后被切断。此限制maxBodySizeBytes可在org.jsoup.helper.HttpConnection.Request中找到。

        Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
        //^^returns a Connection that will only retrieve 1MB of data
        InputStream oneMb = c.execute().bodyStream();
        ZipInputStream oneMbZip = new ZipInputStream(oneMb);

试图解压缩被截断的oneMbZip是导致我得到EOFException的原因

使用下面的代码，我能够将Connection的字节数限制更改为1 GB（1073741824），然后能够在不遇到EOFException的情况下检索zip文件。

        Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
        //^^returns a Connection that will only retrieve 1MB of data
        Connection.Request theRequest = c.request();
        theRequest.maxBodySize(1073741824);
        c.request(theRequest);//Now this connection will retrieve as much as 1GB of data
        InputStream oneGb = c.execute().bodyStream();
        ZipInputStream oneGbZip = new ZipInputStream(oneGb);

请注意，maxBodySizeBytes是一个int，其上限为2,147,483,647，或略低于2GB。