Apache的POI字符运行

Apache的POI字符运行

本文介绍了Apache的POI字符运行.DOCX的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在.doc文件,还有一个功能,通过使用获得第每个字符

In .doc files, There is a function to get each character in paragraph by using

 CharacterRun charrun = paragraph.getCharacterRun(k++);

然后我可以使用这些字符运行检查像

and then I can use those character runs to inspect their attributes like

if ( charrun.isBold() == true) System.out.print(charrun.text());

或类似的东西。但随着.DOCX文件似乎没有任何字符来运行的方法,它可以读取这样的每一个字,我试图用

or something like that. But with .docx files seems to have no characters run method that can read each word like that, I tried to use

XWPFParagraph item = paragraph.get(i);
List<XWPFRun> charrun = item.getRuns();

我发现,当你调用XWPF性格来看,这不会一个字符返回给你,但它会在文档中返回一些随机的长字符串

I found that when you call the character run in XWPF, it won't return one character to you but it will return some random-in-length strings in the document

XWPFRun temp = charrun.get(0);
System.out.println(temp.gettext(0));

这code将不会在返回段第1个字符。

This code won't return 1st character in the paragraph.

那么如何才能解决这个问题?

So how can I fix this?

推荐答案

假设你想要遍历一个Word文档中对所有的(主)段(不含表,标题等),然后遍历字符运行那款,然后遍历一次运行一个字符的文字,你会想要做的是这样的:

Assuming you want to iterate over all the (main) paragraphs in a word document (excluding tables, headers and the like), then iterate over the character runs in that paragraph, then iterate over the text of the run one character at a time, you'd want to do something like:

XWPFDocument doc = new XWPFDocument(OPCPackage.open("myfile.docx"));
for (XWPFParagraph paragraph : doc.getParagraphs()) {
    int pos = 0;
    for (XWPFRun run : paragraph.getRuns()) {
        for (character c : run.text().toCharArray()) {
            System.out.println("The character at " + pos + " is " + c);
            pos++;
        }
    }
}

这将遍历每个字符,并有东西像制表符和换行符重新psented因为他们的性格等值$ P $(东西像 W:标签将被转换)

That will iterate over each character, and will have things like tabs and newlines represented as their character equivalents (things like w:tab will be converted).

有关HWPF,得到了段落的方式,从段落获得运行的方式是相似但不相同,所以没有通用接口。既XWPFRun和HWPF的CharacterRun共享共同的接口,虽然,所以code的那部分可以被重新使用

For HWPF, the way of getting the paragraphs, and the way of getting the runs from a paragraph is similar but not identical, so there's no common interface. Both XWPFRun and HWPF's CharacterRun share a common interface though, so that part of the code can be re-used

请注意,在一个给定的字符运行将共享相同的样式/格式信息​​。因为这个词的工作奇怪的方式,它的可能是两个相邻的运行也将共享相同的样式,并且Word还没有合并他们...

Note that all text in a given character run will share the same style / formatting information. Because of the strange ways that Word works, it's possible that two adjacent runs will also share the same styles, and Word hasn't merged them...

这篇关于Apache的POI字符运行.DOCX的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 21:43