问题描述
我是新的nokogiri和目前最熟悉的CSS选择器,我试图解析表中的信息,下面是表和我使用的代码示例,我坚持适当的if
表:
code>< div class =holder>
< div class =row>
< div class =c1>
<! - 内容我不需要 - >
< / div>
< div class =c2>
< span class =data>
<! - 内容我不需要 - >
< span class =data>
< / div>
< / div>
...
< div class =row>
< div class =c1>
SPECIFIC TEXT
< / div>
< div class =c2>
< span class =data>
我想要什么
< / span>
< / div>
< / div>
< / div>
我的脚本:(如果在表中找到SPECIFIC TEXT,数据变量 - 所以我搞错了do循环或if语句的知识)
data = []
page.agent.get(url)
page.search('div.row')。each do | row_data |
if(row_data.search('div.c1:contains(/ SPECIFIC TEXT /)')。text.strip
temp = row_data.search('div.c2 span.data')。 text.strip
data<< temp
end
end
当你可以在单个CSS选择器中提取所需的内容时,不需要停止和插入ruby逻辑。
data = page.search('div.row> div.c1:contains(SPECIFIC TEXT)+ div.c2 span.data')
这将只包括与选择器匹配的那些(例如遵循SPECIFIC TEXT)。
这是您的逻辑可能出错的地方:
此代码
if(row_data.search('div.c1:contains(SPECIFIC TEXT)'...
temp = row_data.search('div.c2 span.data')...
首先搜索特定文本的行,然后如果匹配,则返回与第二个查询匹配的所有行,关键是CSS选择器中的
+
,它将返回紧随其后的元素(例如下一个兄弟元素)。我假设,下一个元素总是你想要的。I am new to nokogiri and so far most familiar with CSS selectors, I am trying to parse information from a table, below is a sample of the table and the code I'm using, I'm stuck on the appropriate if statement, as it seems to return the whole contents of the table.
Table:
<div class="holder"> <div class ="row"> <div class="c1"> <!-- Content I Don't need --> </div> <div class="c2"> <span class="data"> <!-- Content I Don't Need --> <span class="data"> </div> </div> ... <div class="row"> <div class="c1"> SPECIFIC TEXT </div> <div class="c2"> <span class="data"> What I want </span> </div> </div> </div>
My Script: (if SPECIFIC TEXT is found in the table it returns every "div.c2 span.data" variable - so I've either screwed up my knowledge of do loops or if statements)
data = [] page.agent.get(url) page.search('div.row').each do |row_data| if (row_data.search('div.c1:contains("/SPECIFIC TEXT/")').text.strip temp = row_data.search('div.c2 span.data').text.strip data << temp end end
解决方案There's no need to stop and insert ruby logic when you can extract what you need in a single CSS selector.
data = page.search('div.row > div.c1:contains("SPECIFIC TEXT") + div.c2 span.data')
This will include only those that match the selector (e.g. follow the SPECIFIC TEXT).
Here's where your logic may have gone wrong:
This code
if (row_data.search('div.c1:contains("SPECIFIC TEXT")'... temp = row_data.search('div.c2 span.data')...
first searches the row for the specific text, then if it matches, returns ALL rows matching the second query, which has the same starting point. The key is the
+
in the CSS selector above which will return elements immediately following (e.g. the next sibling element). I'm making an assumption, of course, that the next element is always what you want.这篇关于nokogiri +机械化css选择器的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!