本文介绍了如何找出使 poi 损坏 xlsx/xlsm 文件的原因的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,即 Apache POI 通过读取和写入(例如使用以下代码)损坏"了 xlsm/xlsx 文件

I have the issue that Apache POI "corrupted" a xlsm / xlsx file by just reading and writing it (e.g. with the following code)

public class Snippet {
    public static void main(String[] args) throws Exception {

        String str1 = "c:/tmp/spreadsheet.xlsm";
        String str2 = "c:/tmp/spreadsheet_poi.xlsm";

        // open file
        XSSFWorkbook wb = new XSSFWorkbook(new FileInputStream(new File(str1)));

        // save file
        FileOutputStream out = new FileOutputStream(str2);
        wb.write(out);
        wb.close();
        out.close();

    }
}

在 Excel 中打开电子表格_poi.xlsm 后,您将收到如下错误

Once you open the spreadsheet_poi.xlsm in Excel you'll get an error like the following

我们发现 xxx 中的某些内容存在问题.您希望我们尽可能多地尝试恢复..."?

"We found a problem with some content in xxx. Do you want us to try to recover as much as we can..."?

如果你说是,你最终会得到一个看起来像这样的日志:

If you say yes you'll end up with a log which could look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <logFileName>error145040_01.xml</logFileName>
    <summary>Errors were detected in file 'C:\tmp\spreadsheet_poi.xlsm'</summary>
    <repairedParts>
        <repairedPart>Repaired Part: /xl/worksheets/sheet4.xml part with XML error.  Load error. Line 2, column 0.</repairedPart>
        <repairedPart>Repaired Part: /xl/worksheets/sheet5.xml part with XML error.  Load error. Line 2, column 0.</repairedPart>
        <repairedPart>Repaired Part: /xl/worksheets/sheet8.xml part with XML error.  Load error. Line 2, column 0.</repairedPart>
    </repairedParts>
</recoveryLog>

更详细地调试问题的最佳方法是什么(例如找出是什么让 poi 损坏"了文件?

Whats the best approach to debug the issue in more detail (e.g. find out what makes poi to "corrupt" the file?

推荐答案

最终我发现最好的调试方法是两件事

Eventually I found how that the best approach for debugging this are two things

  1. 打开受影响的工作簿(例如使用 7zip 并使用 xml 编辑器格式化受影响的工作表(例如 Notepad++ > Plugins > XML Tools > Pretty print(仅限 XML - 带换行符).保存文件并更新 xlsm 文件后,您将在 Excel 错误日志中获取真实"行号.替代选项(我还没有尝试过,但应该根据 POI 邮件列表工作:使用 OOXMLPrettyPrint (https://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/ooxml/dev/) 格式化文件,然后在 excel 中重新打开它.
  2. 如果真正的行号还没有帮助比较原始 xlsx 文件和 poi 保存的表单 xml 文件.您会注意到属性不同,顺序也不同.为了正确比较我使用 Beyond Compare 和其他文件格式"(见 https://weblogs.asp.net/lorenh/comparing-xml-files-with-beyond-compare-3-brilliant 了解更多信息).也许还有另一种同样好的差异工具.
  1. open the affected workbook (e.g. with 7zip and format the affected sheets with an xml editor (e.g. Notepad++ > Plugins > XML Tools > Pretty print (XML only - with line breaks). After saving the files and updating the xlsm file you'll get the "real" line numbers in the Excel error log. Alternative option (which I haven't tried but should work according to the POI mailing liste: use OOXMLPrettyPrint (https://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/ooxml/dev/) to format the file and then reopen it it in excel.
  2. if the real line numbers not already help compare the sheet xml files of the original xlsx file and the one saved by poi. You'll notice that there are differences in regards to the attributes and also the order is different. In order to properly compare I used Beyond Compare with "Additional File Formats" (see https://weblogs.asp.net/lorenh/comparing-xml-files-with-beyond-compare-3-brilliant for more information). Maybe there is another diff tool that is equally good.

在我的例子中,问题是 poi 以某种方式改变了维度设置

In my case the problem was that poi somehow changed the dimension setting from

<dimension ref="A1:XFD147"/>

<dimension ref="A1:XFE147"/>

(XFE 是一个不存在的列).我通过删除原始 xlsx 文件中的许多空列来修复它.

(with XFE beeing a non existing column). I fixed it by removing those many empty columns in the original xlsx file.

这篇关于如何找出使 poi 损坏 xlsx/xlsm 文件的原因的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 19:57