问题描述
我的拼凑项目的结果如下:
The result from my scrapy project looks like this:
<div class="news_li">...</div>
<div class="news_li">...</div>
<div class="news_li">...</div>
...
<div class="news_li">...</div>
每个"news_li"类如下所示:
And each "news_li" class looks like this:
<div class="news_li">
<div class="a">
<a href="aaa">
<div class="a1"></div>
</a>
</div>
<a href="xxx">
<div class="b">
<div class="b1"></div>
<div class="b2"></div>
<div class="b3"></div>
</div>
</a>
</div>
我正在尝试通过以下命令一次在scrapy shell中提取一个信息:
I am trying to extract information one at a time in the scrapy shell by the following command:
response.xpath("//div[@class='news_li']")[0].xpath("//div[@class='a1']").extract()
response.xpath("//div[@class='news_li ']/descendant::div[@class='a1']").extract()
但是这些命令将所有其他"news_li"类的所有"a1"类返回给我
But these commands returns me with all the "a1" class from all other "news_li" class
我有2个问题:
-
如何一次获取一个子div信息.
How do I get the child div information one at a time.
如何获取< a href ="aaa"></a>和< a href ="xxx"></a>
分别?(区别是第一个包裹在父div中,第二个包裹在其自身中.)
How do I get the <a href="aaa"> </a> and <a href="xxx"> </a>
separately? (The difference is the first one is wrap in a parent div and the second one is by itself.)
非常感谢.
具体来说,我如何提取信息取决于父/根节点?我查找 XPath轴,但我尝试使用后裔",但它不起作用
To be specific, how can i extract the information depends on the parent /root node? I look up XPath Axes and I tried with 'descendant', but it does not work.
推荐答案
尝试以下方法.
# first link
response.xpath("(//div[@class='news_li']//a)[1]").extract()
# second link
response.xpath("(//div[@class='news_li']//a)[2]").extract()
# change the X value in the below xpath to get the first link
//div[@class='news_li'][X]/descendant::div[@class='a1']/parent::a
# change the X value in the below xpath to get the second link (direct
# link) based on the child div
//div[@class='news_li'][X]/descendant::a[div[@class='b']]
这篇关于XPath语法:如何基于父div获取子div信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!