Jsoup简单的HTML标签拆分

本文介绍了Jsoup简单的HTML标签拆分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的java应用程序读取数据库行，并创建iText PDF文件。问题是我得到的行中的一些单元格包含粗体HTML标记，这意味着我还必须在我各自的iText段落中显示一个粗体块。

My java app reads database rows, and creates iText PDF files. The problem is that some cells in the rows I get, contain the bold HTML tag, and this means that I must also display a bold Chunk in my respective iText Paragraph.

因此，例如DB行的一个单元格可以如下：

So for example one cell of the DB line may be as follows:

This is an <b>important</b> line and i <b>want</b> formatting in it

我目前正在做的只是用jsoup获取粗体

What I am currently doing is simply obtaining the bolds with jsoup

org.jsoup.nodes.Document doc = Jsoup.parse(input);
org.jsoup.select.Elements bold = doc.select("B");
System.out.println("[BODY: "+doc.body().text()+"] BOLD:>> " + bold.text());

我真正想做的是在粗体和非粗体部分之间拆分字符串。因此，我的问题的正确解决方案将输出

What I really want to do is to split the string, between the bold and the non-bold parts. So a proper solution to my problem would output

This is an
<b>important</b>
line and i
<b>want</b>
formatting in it

或类似的东西，这样我就可以创建我的iText Chunks并添加到我的段落。
有没有办法用Jsoup做到这一点？

or something similar, so that i can create my iText Chunks and add to my Paragraph.Is there any way to do this with Jsoup?

推荐答案

您可以使用节点代替元素：

final String html = "This is an <b>important</b> line and i <b>want</b> formatting in it";
Document doc = Jsoup.parse(html);


for( Node node : doc.body().childNodes() )
{
    System.out.println(node.toString());
}

输出：

This is an
<b>important</b>
 line and i
<b>want</b>
 formatting in it

如果前导空白有问题，请使用 node.toString（）。trim（）。

If the leading blanks are a problem, use node.toString().trim().

这篇关于Jsoup简单的HTML标签拆分的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！