本文介绍了如何使用 Tika 的 XWPFWordExtractorDecorator 类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
有人告诉我,Tika 的 XWPFWordExtractorDecorator 类用于将 docx 转换为 html.但我不确定如何使用这个类从 docx 获取 HTML.任何其他做同样工作的图书馆也值得赞赏/
Someone told me that Tika's XWPFWordExtractorDecorator class is used to convert docx into html. But I am not sure how to use this class to get the HTML from docx. Any other library for doing the same job is also appreciated/
推荐答案
你不应该直接使用它
相反,以通常的方式调用 Tika,它会为您调用适当的代码
Instead, call Tika in the usual way, and it'll call the appropriate code for you
如果你想让 XHTML 解析一个文件,代码看起来像
If you want XHTML from parsing a file, the code looks something like
// Either of these will work, the latter is recommended
//InputStream input = new FileInputStream("test.docx");
InputStream input = TikaInputStream.get(new File("test.docx"));
// AutoDetect is normally best, unless you know the best parser for the type
Parser parser = new AutoDetectParser();
// Handler for indented XHTML
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.setResult(new StreamResult(sw));
// Call the Tika Parser
try {
Metadata metadata = new Metadata();
parser.parse(input, handler, metadata, new ParseContext());
String xml = sw.toString();
} finally {
input.close();
}
这篇关于如何使用 Tika 的 XWPFWordExtractorDecorator 类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!