本文介绍了Nokogiri用于在唯一标签集之间选择文本和html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用Nokogiri来提取两个独特的标签集之间的文本。

I am trying to use Nokogiri to extract the text in-between two unique sets of tags.

< h2 class =point>之间的标记< h2 class =解决方案< / h2> 之间的所有HTML和< h2 class =point>解决方案< / h2> < div class =frame box sketh>

What is the best way to get the text within the p-tag in between <h2 class="point">The problem</h2> and <h2 class="point">The solution</h2>, and then all of the HTML between <h2 class="point">The solution</h2> and <div class="frame box sketh">?

>

Sample of the full html:

<h2 class="point">The problem</h2>
<p>TEXT I WANT </p>
<h2 class="point">The solution</h2>
HTML I WANT with it's own set of tags (but never an <h2> or <div>)
<div class="frame box sketh"><img src="URL for Image I want later" alt="" /></div>

谢谢!

推荐答案

require 'nokogiri'

doc = Nokogiri.HTML(DATA)
doc.search('//h2/following-sibling::node()[name() != "h2" and name() != "div" and text() != "\n"]').each do |block|
  p block.text
end

__END__
<h2 class="point">The problem</h2>
<p>TEXT I WANT</p>
<h2 class="point">The solution</h2>
<div>dont capture this</div>
<span>HTML I WANT with it's <p>own set <b>of</b> tags</p></span>
<div class="frame box sketh"><img src="URL for Image I want later" alt="" /></div>

输出:

"TEXT I WANT"
"HTML I WANT with it's own set of tags"


$ b b

此XPath选择 h2 的所有后面的兄弟节点,它不是 h2 div 或只包含字符串\\\

This XPath selects all following sibling nodes of h2 which is not a h2, div or contains nothing but the string "\n".

这篇关于Nokogiri用于在唯一标签集之间选择文本和html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 06:12