XPath语法:如何基于父div获取子div信息 | 如何基于父div获取子div信息

本文介绍了XPath语法:如何基于父div获取子div信息的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的拼凑项目的结果如下:

The result from my scrapy project looks like this:

<div class="news_li">...</div>
<div class="news_li">...</div>
<div class="news_li">...</div>
...
<div class="news_li">...</div>

每个"news_li"类如下所示:

And each "news_li" class looks like this:

 <div class="news_li">
    <div class="a">
        <a href="aaa">
            <div class="a1"></div>
        </a>
    </div>
    <a href="xxx">
        <div class="b">
            <div class="b1"></div>
            <div class="b2"></div>
            <div class="b3"></div>
        </div>
    </a>
</div>

我正在尝试通过以下命令一次在scrapy shell中提取一个信息:

I am trying to extract information one at a time in the scrapy shell by the following command:

response.xpath("//div[@class='news_li']")[0].xpath("//div[@class='a1']").extract()
response.xpath("//div[@class='news_li  ']/descendant::div[@class='a1']").extract()

但是这些命令将所有其他"news_li"类的所有"a1"类返回给我

But these commands returns me with all the "a1" class from all other "news_li" class

我有2个问题:

如何一次获取一个子div信息.

How do I get the child div information one at a time.

如何获取< a href ="aaa"></a>和< a href ="xxx"></a> 分别?(区别是第一个包裹在父div中，第二个包裹在其自身中.)

How do I get the <a href="aaa"> </a> and <a href="xxx"> </a> separately? (The difference is the first one is wrap in a parent div and the second one is by itself.)

非常感谢.

具体来说，我如何提取信息取决于父/根节点?我查找 XPath轴，但我尝试使用后裔"，但它不起作用

To be specific, how can i extract the information depends on the parent /root node? I look up XPath Axes and I tried with 'descendant', but it does not work.

推荐答案

尝试以下方法.

# first link
response.xpath("(//div[@class='news_li']//a)[1]").extract()
# second link
response.xpath("(//div[@class='news_li']//a)[2]").extract()

 # change the X value in the below xpath to get the first link
//div[@class='news_li'][X]/descendant::div[@class='a1']/parent::a

 # change the X value in the below xpath to get the second link (direct
 # link) based on the child div
 //div[@class='news_li'][X]/descendant::a[div[@class='b']]

这篇关于XPath语法:如何基于父div获取子div信息的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！