问题描述
jsoup的高手可以告诉我一些将html过滤为文本/字符串的建议吗?我试过调用Document的text().但是所有标签/元素都会被过滤.我的目的是过滤一些指定的标签.
can any master of jsoup tell me some suggestions to filter html to text/string? I've tried calling text() of Document. But all tags/elements will be filtered. My aim is to filter some specified tags.
即:我有类似html的文本:
i.e: I've html text like:
<div>hello<p>world</div>,<table><tr><td>xxx</td></tr>
获得结果:
<div>hello<p>world</div>,xxx
已过滤标签.
推荐答案
我现在无法测试,但是我想您想编写一个递归函数,该函数逐步遍历树并根据条件打印每个节点.以下是其外观的示例,但我希望您必须对其进行修改以更精确地满足您的需求.
I can't test this right now but I think you want to write a recursive function that steps through the tree and prints each node based on a condition. The following is an example of what it might look like but I expect that you will have to modify it to suit your needs more precisely.
Document doc = JSoup.parse(page_text);
recursive_print(doc.head());
recursive_print(doc.body());
...
private static Set<String> ignore = new HashSet<String>(){{
add("table");
...
}};
public static void recursive_print(Element el){
if(!ignore.contains(el.className()))
System.out.println(el.html());
for(Element child : el.children())
recursive_print(child);
}
这篇关于Jsoup仅过滤掉一些从html到文本的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!