本文介绍了从“div"中抓取数据班级的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我尝试并能够使用以下脚本从 td
类中抓取数据:
nArticles <- getNodeSet(pagetree,"///*/td[@class='bg1 W1']///*/li[@class='LI2 font28 C bold W1']") #current价钱current.price <- xmlValue(nArticles[[1]])
现在我有一个像下面这样的网络资源:
<div style="float: left;"><ul class="BlockItemIndex" style="width:123px; height:92px">
指数<li class="I1" style="font:bold 20px Arial"><span id="ctl00_ctl00_cphContent_cphContent_lblIndex">21,549.28</span></li><li class="I1" style="font:normal 15px Arial"><span id="ctl00_ctl00_cphContent_cphContent_lblChange"><span class="pos bold">+70.56 (0.33%)</span></span></li><span class="font12">Turnover</span><span id="ctl00_ctl00_cphContent_cphContent_lblTurnover">70.41B</span></li><div class="seperate"></div><div style="float: left;"><ul class="BlockItemChange" style="width:75px">
高的<span id="ctl00_ctl00_cphContent_cphContent_lblHigh">21,569.74</span></li><ul class="BlockItemChange" style="width:75px; margin-top:2px;">低的<span id="ctl00_ctl00_cphContent_cphContent_lblLow">21,302.19</span></li>
<div class="seperate"></div><div style="float: left;"><ul class="BlockItemChange" style="width:75px">
打开<span id="ctl00_ctl00_cphContent_cphContent_lblOpen">21,339.02</span></li><ul class="BlockItemChange" style="width:75px; margin-top:2px;">上一页 关闭<span id="ctl00_ctl00_cphContent_cphContent_lblPreClose">21,478.72</span></li>我需要接21,549.28
,我尝试了以下操作:
nArticles
但是失败了.任何人都可以帮忙吗?谢谢.
解决方案
很难知道您使用什么来确定您感兴趣的值,但是
query = '//ul[@class="BlockItemIndex"]/li[2]/span/text()'xpathSApply(xml,查询,xmlValue)
挑选出至少有两个包含 span 元素的 li 元素的所有 BlockItemIndex 元素.由于所有 li 元素都具有相同的类,因此指定一个也无济于事.我不确定你想用 *
完成什么;我认为 //
是多余的.稍后在您的查询中, //
不是您想要的,您对 BlockItemClass 元素的直接后代感兴趣.
I tried and am able scrape data from td
class using the script below:
nArticles <- getNodeSet(pagetree,"//*/td[@class='bg1 W1']//*/li[@class='LI2 font28 C bold W1']") #current price
current.price <- xmlValue(nArticles[[1]])
Now I have a websource like below:
<div>
<div style="float: left;">
<ul class="BlockItemIndex" style="width:123px; height:92px">
<li class="font12 I1">
Index
</li>
<li class="I1" style="font:bold 20px Arial">
<span id="ctl00_ctl00_cphContent_cphContent_lblIndex">21,549.28</span></li>
<li class="I1" style="font:normal 15px Arial">
<span id="ctl00_ctl00_cphContent_cphContent_lblChange"><span class="pos bold">+70.56 (0.33%)</span></span></li>
<li class="I1">
<span class="font12">Turnover</span> <span id="ctl00_ctl00_cphContent_cphContent_lblTurnover">70.41B</span></li>
</ul>
</div>
<div class="seperate"></div>
<div style="float: left;">
<ul class="BlockItemChange" style="width:75px">
<li class="font12 I1">
High
</li>
<li class="I2">
<span id="ctl00_ctl00_cphContent_cphContent_lblHigh">21,569.74</span></li>
</ul>
<ul class="BlockItemChange" style="width:75px; margin-top:2px;">
<li class="font12 I1">
Low
</li>
<li class="I2">
<span id="ctl00_ctl00_cphContent_cphContent_lblLow">21,302.19</span></li>
</ul>
</div>
<div class="seperate"></div>
<div style="float: left;">
<ul class="BlockItemChange" style="width:75px">
<li class="font12 I1">
Open
</li>
<li class="I2">
<span id="ctl00_ctl00_cphContent_cphContent_lblOpen">21,339.02</span></li>
</ul>
<ul class="BlockItemChange" style="width:75px; margin-top:2px;">
<li class="font12 I1">
Prev Close
</li>
<li class="I2">
<span id="ctl00_ctl00_cphContent_cphContent_lblPreClose">21,478.72</span></li>
</ul>
</div>
</div>
I need to pick up 21,549.28
, and I tried the following:
nArticles <- getNodeSet(pagetree,"//*/ul[@class='BlockItemChange']//*/li[@class='I2']")
But fails. Can anyone help? Thanks.
解决方案
It's hard to know what you're using to determine the value you're interested in, but
query = '//ul[@class="BlockItemIndex"]/li[2]/span/text()'
xpathSApply(xml, query, xmlValue)
picks out all BlockItemIndex elements that have at least two li elements containing a span element. Since all li elements have the same class, it doesn't help to specify one. I'm not sure what you were trying to accomplish with *
; I think it's redundant with //
. Later in your query, //
isn't what you want, you're interested in immediate descendants of the BlockItemClass element.
这篇关于从“div"中抓取数据班级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!