问题描述
我正在尝试在带有日期的表格中抓取一行.我只想抓取具有今天日期的第三行.
I am trying to scrape a row in a table with a date. I want to scrape only the third row that have the date today.
这是我的机械化代码.我试图选择今天有日期及其列的列女巫:
This is my mechanize code. I am trying to select the colum row witch have the date today and its and its columns:
agent.page.search("//td").map(&:text).map(&:strip)
agent.page.search("//td").map(&:text).map(&:strip)
Output:
"11-02-2011", "1", "1", "1", "1", "0", "0,00 DKK", "0,00", "0,00 DKK",
"12-02-2011", "5", "5", "1", "4", "0", "0,00 DKK", "0,00", "0,00 DKK",
"14-02-2011", "1", "3", "1", "1", "0", "0,00 DKK", ",00", "0,00 DKK",
"7", "9", "3", "6", "0", "0,00 DKK", "0,00", "0,00 DKK
"
我只想抓取今天的第三行.
I want to only scrape the third row that is the date today.
推荐答案
不是使用'//td'
遍历<td>
标记,而是搜索<tr>
标记,仅获取第三个,然后遍历.
Rather than loop over the <td>
tags using '//td'
, search for the <tr>
tags, grab only the third one, then loop over '//td'
.
Mechanize在内部使用Nokogiri,所以这是使用Nokogiri-ese的方法:
Mechanize uses Nokogiri internally, so here's how to do it in Nokogiri-ese:
html = <<EOT
<table>
<tr><td>11-02-2011</td><td>1</td><td>1</td><td>1</td><td>1</td><td>0</td><td>0,00 DKK</td><td>0,00</td><td>0,00 DKK</td></tr>
<tr><td>12-02-2011</td><td>5</td><td>5</td><td>1</td><td>4</td><td>0</td><td>0,00 DKK</td><td>0,00</td><td>0,00 DKK</td></tr>
<tr><td>14-02-2011</td><td>1</td><td>3</td><td>1</td><td>1</td><td>0</td><td>0,00 DKK</td><td>,00</td><td>0,00 DKK</td></tr>
</table>
EOT
require 'nokogiri'
require 'pp'
doc = Nokogiri::HTML(html)
pp doc.search('//tr')[2].search('td').map{ |n| n.text }
>> ["14-02-2011", "1", "3", "1", "1", "0", "0,00 DKK", ",00", "0,00 DKK"]
使用机械化agent.page
所附的.search('//tr')[2].search('td').map{ |n| n.text }
,如下所示:
Use the .search('//tr')[2].search('td').map{ |n| n.text }
appended to Mechanize's agent.page
, like so:
agent.page.search('//tr')[2].search('td').map{ |n| n.text }
自从我与《机械化》一起玩已经有一段时间了,所以它也可能是agent.page.parser...
.
It's been a while since I played with Mechanize, so it might also be agent.page.parser...
.
将这些信息放入您的原始问题很重要.您的问题越准确,我们的答案就越准确.
It's important to put that information into your original question. The more accurate your question, the more accurate our answers.
这篇关于Ruby Mechanize屏幕抓取帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!