问题描述
我一直在使用 Apache POI 来操作 Microsoft Word .docx 文件——即打开一个最初在 Microsoft Word 中创建的文档,修改它,将其保存到一个新文档中.
I have been using Apache POI to manipulate Microsoft Word .docx files — ie open a document that was originally created in Microsoft Word, modify it, save it to a new document.
我注意到 Apache POI 创建的新段落缺少 修订保存 ID,通常称为 RSID 或 rsidR.Word 使用它来识别在一个会话中对文档所做的更改,比如在保存之间.它是可选的——如果他们愿意,用户可以在 Microsoft Word 中关闭它——但实际上几乎每个人都有它,所以几乎每个文档都充满了 RSID.阅读 这篇对 RSID 的出色解释 了解更多相关信息.
I notice that new paragraphs created by Apache POI are missing a Revision Save ID, often known as an RSID or rsidR. This is used by Word to identify changes made to a document in one session, say between saves. It is optional — users could turn it off in Microsoft Word if they want — but in reality almost everyone has it on so almost every document is fulls of RSIDs. Read this excellent explanation of RSIDs for more about that.
在 Microsoft Word 文档中,word/document.xml
包含如下段落:
In a Microsoft Word document, word/document.xml
contains paragraphs like this:
<w:p w:rsidR="007809A1" w:rsidRDefault="007809A1" w:rsidP="00191825">
<w:r>
<w:t>Paragraph of text here.</w:t>
</w:r>
</w:p>
但是由 POI 创建的相同段落在 word/document.xml
中将如下所示:
However the same paragraph created by POI will look like this in word/document.xml
:
<w:p>
<w:r>
<w:t>Paragraph of text here.</w:t>
</w:r>
</w:p>
我发现我可以使用如下代码强制 POI 为每个段落添加一个 RSID:
I've figured out that I can force POI to add an RSID to each paragraph using code like this:
byte[] rsid = ???;
XWPFParagraph paragraph = document.createParagraph();
paragraph.getCTP().setRsidR(rsid);
paragraph.getCTP().setRsidRDefault(rsid);
但是我不知道我应该如何生成 RSID.
However I don't know how I should be generating the RSIDs.
POI 是否有办法生成和/或跟踪 RSID?如果没有,有什么方法可以确保我生成的 RSID 不会与文档中已有的 RSID 冲突?
Does POI have a way or generate and/or keep track of RSIDs? If not, is there any way I can ensure that an RSID that I generate doesn't conflict with one that's already in the document?
推荐答案
看起来有效 rsid 条目的列表保存在 条目的 word/settings.xml 中.XWPF 应该能够让您访问它.
It looks like the list of valid rsid entries is held in word/settings.xml in the <w:rsids>
entry. XWPF should be able to give you access to that already.
您可能想要生成一个 8 位十六进制数字长随机数,检查它是否在那里,如果是,则重新生成.一旦你有一个独特的,将它添加到该列表中,然后用它标记你的段落.
You'd probably want to generate a 8 hex digit long random number, check if that's in there, and re-generate if it is. Once you have a unique one, add it into that list, then tag your paragraphs with it.
我建议您加入 poi 开发列表 (邮件列表详细信息),我们可以帮助您为它制作补丁.我认为要做的事情是:
What I'd suggest is that you join the poi dev list (mailing list details), and we can give you a hand on working up a patch for it. I think the things to do are:
- 封装 word/settings.xml 中的 RSids 条目,让您轻松获取列表并生成新的(唯一的)
- 段落和运行中不同 RSid 条目的包装器
- 段落和运行的方法以获取 RSid 包装器、添加新包装器或清除现有包装器
我们应该把它带到开发列表中:)
We should take this to the dev list though :)
这篇关于如何使用 Apache POI 在 Word .docx 文件中正确生成 RSID 属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!