问题描述
以下代码可成功提取tid和term数据:
(由Uri Agassi慷慨解答)
<$ p $(1..10)
doc = Nokogiri :: HTML(open(http://somewebsite.com/#{i}/))$ b $ b tids = doc.xpath(// div [contains(concat('',@class,''),'thing')])。collect {| node | node ['data-thing-id']}
terms = doc.xpath(// div [contains(concat('',@class,''),'col_a')])。collect { |节点| node.text.strip}
tids.zip(terms).each do | tid,term |
puts tid ++ term
end
end
from以下示例html:
< div class =thing text-textdata-thing-id =29966403> ;
< div class =thinguser>< i class =ico ico-water ico-blue>< / i>
< div class =status>在7天内
< / div>
< / div>
< div class =ignore-ui pull-right>< input type =check box>
< / div>
< div class =col_a col text>
< div class =text> foobar
< / div>
< / div>
< div class =col_b col text>
< div class =text> foobar desc
< / div>
< / div>
< / div>
如果我想以相同的方式提取状态(7天内字符串)什么是最好的方式来做到这一点?我似乎无法弄清楚。
有人会善意地详细解释tids和术语赋值线究竟在做什么吗?我不明白这一点,Nokogiri的文档似乎没有涵盖这一点。
非常感谢您提前。
〜Chris
我所有关于在nokogiri中使用css选择器。
doc = Nokogiri :: HTML(open(http://somewebsite.com/# {内容
The following code successfully extracts tid and term data:
(answered generously by Uri Agassi)
for i in (1..10)
doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
tids = doc.xpath("//div[contains(concat(' ', @class, ' '),' thing ')]").collect {|node| node['data-thing-id']}
terms = doc.xpath("//div[contains(concat(' ', @class, ' '),' col_a ')]").collect {|node| node.text.strip }
tids.zip(terms).each do |tid, term|
puts tid+" "+term
end
end
from the following sample html:
<div class="thing text-text" data-thing-id="29966403">
<div class="thinguser"><i class="ico ico-water ico-blue"></i>
<div class="status">in 7 days
</div>
</div>
<div class="ignore-ui pull-right"><input type="check box" >
</div>
<div class="col_a col text">
<div class="text">foobar
</div>
</div>
<div class="col_b col text">
<div class="text">foobar desc
</div>
</div>
</div>
If I wanted to pull status (the "in 7 days" string) info in the same fashion, what's the best way to do that? I can't seem to figure it out.
Would someone be kind enough to explain in detail what the tids and terms assignment lines are actually doing? I don't get it and the Nokogiri documentation doesn't seem to cover this.
Big thanks in advance.
~Chris
I'm all about using css selectors in nokogiri. Something like this should work.
doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
seven_days = doc.css('status').content
这篇关于用Nokogiri解析div元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!