本文介绍了在 Apache POI 中使用 WordToHtmlConverter 转换器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 WordToHtmlConverter 类来转换 HTML 中的 Word 文档,但文档不清楚.

I am trying to use WordToHtmlConverter class to convert a word document in HTML, but the documentation is not clear.

WordToHtmlConverter 有一个构造函数采用 org.w3c.dom.Document,但我认为它不是 word 文档.

The WordToHtmlConverter has a constructor taking org.w3c.dom.Document, but I don't think it is the word document.

谁有关于如何加载word文档并将其转换为html的示例程序.

Does anyone have a sample program on how to load a word document and convert it into html.

推荐答案

你现在最好的办法可能是查看单元测试,例如 TestWordToHtmlConverter.那会告诉你怎么做

You best bet for now is probably to look at the unit tests, eg TestWordToHtmlConverter. That will show you how to do it

但一般来说,您传入要填充的 xml 文档,让 WordToHtmlConverter 从 Word 文档生成 HTML,然后将 xml 文档转换为适当的输出(缩进、换行等)

In general though, you pass in the xml document to be populated, have WordToHtmlConverter generate the HTML into it from the Word document, then transform the xml document into appropriate output (indenting, new lines etc)

您的代码可能看起来像:

Your code would want to look something like:

    Document newDocument = DocumentBuilderFactory.newInstance()
            .newDocumentBuilder().newDocument();
    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
            newDocument );

    wordToHtmlConverter.processDocument( hwpfDocument );

    StringWriter stringWriter = new StringWriter();
    Transformer transformer = TransformerFactory.newInstance()
            .newTransformer();
    transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
    transformer.setOutputProperty( OutputKeys.ENCODING, "utf-8" );
    transformer.setOutputProperty( OutputKeys.METHOD, "html" );
    transformer.transform(
            new DOMSource( wordToHtmlConverter.getDocument() ),
            new StreamResult( stringWriter ) );

    String html = stringWriter.toString();

这篇关于在 Apache POI 中使用 WordToHtmlConverter 转换器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-27 17:09