在特定元素上使用xpath

在特定元素上使用xpath

本文介绍了Python:在本地/在特定元素上使用xpath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从具有xpath的页面获取链接.问题是我只希望表中的链接,但是如果我在整个页面上应用xpath表达式,则会捕获不需要的链接.

I'm trying to get the links from a page with xpath. The problem is that I only want the links inside a table, but if I apply the xpath expression on the whole page I'll capture links which I don't want.

例如:

tree = lxml.html.parse(some_response)
links = tree.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")

问题是将表达式应用于整个文档.我找到了想要的元素,例如:

The problem is that applies the expression to the whole document. I located the element I want, for example:

tree = lxml.html.parse(some_response)
root = tree.getroot()
table = root[1][5] #for example
links = table.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")

但是,这似乎也在整个文档中执行查询,因为我仍在捕获表外的链接. 此页面说,当在元素上使用xpath()时,XPath表达式将根据元素(如果是相对的)或根树(如果是绝对的):".因此,我使用的是一个绝对表达式,我需要使其相对吗?是吗?

But that seems to be performing the query in the whole document as well, as I still am capturing the links outside of the table. This page says that "When xpath() is used on an Element, the XPath expression is evaluated against the element (if relative) or against the root tree (if absolute):". So, what I using is an absolute expression and I need to make it relative? Is that it?

基本上,我该如何过滤仅存在于此表中的元素?

Basically, how can I go about filtering only elements that exist inside of this table?

推荐答案

您的xpath以斜杠(/)开头,因此是绝对的.在前面添加一个点(.),使其相对于当前元素,即

Your xpath starts with a slash (/) and is therefore absolute. Add a dot (.) in front to make it relative to the current element i.e.

links = table.xpath(".//a[contains(@href, 'http://www.example.com/filter/')]")

这篇关于Python:在本地/在特定元素上使用xpath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 10:19