本文介绍了Jsoup:如何在2个标头标签之间获取所有html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在2个h1标签之间获取所有html。实际任务是根据h1(标题1)标记将html分成帧(章节)。
I am trying to get all html between 2 h1 tags. Actual task is to break the html into frames(chapters) based of the h1(heading 1) tags.
感谢任何帮助。
谢谢
Sunil
ThanksSunil
推荐答案
如果你想连续两次获取和处理所有元素 h1
您可以为兄弟姐妹工作的标签。以下是一些示例代码:
If you want to get and process all elements between two consecutive h1
tags you can work on siblings. Here's some example code:
public static void h1s() {
String html = "<html>" +
"<head></head>" +
"<body>" +
" <h1>title 1</h1>" +
" <p>hello 1</p>" +
" <table>" +
" <tr>" +
" <td>hello</td>" +
" <td>world</td>" +
" <td>1</td>" +
" </tr>" +
" </table>" +
" <h1>title 2</h1>" +
" <p>hello 2</p>" +
" <table>" +
" <tr>" +
" <td>hello</td>" +
" <td>world</td>" +
" <td>2</td>" +
" </tr>" +
" </table>" +
" <h1>title 3</h1>" +
" <p>hello 3</p>" +
" <table>" +
" <tr>" +
" <td>hello</td>" +
" <td>world</td>" +
" <td>3</td>" +
" </tr>" +
" </table>" +
"</body>" +
"</html>";
Document doc = Jsoup.parse(html);
Element firstH1 = doc.select("h1").first();
Elements siblings = firstH1.siblingElements();
List<Element> elementsBetween = new ArrayList<Element>();
for (int i = 1; i < siblings.size(); i++) {
Element sibling = siblings.get(i);
if (! "h1".equals(sibling.tagName()))
elementsBetween.add(sibling);
else {
processElementsBetween(elementsBetween);
elementsBetween.clear();
}
}
if (! elementsBetween.isEmpty())
processElementsBetween(elementsBetween);
}
private static void processElementsBetween(
List<Element> elementsBetween) {
System.out.println("---");
for (Element element : elementsBetween) {
System.out.println(element);
}
}
这篇关于Jsoup:如何在2个标头标签之间获取所有html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!