本文介绍了Xpath 查询,使某个查询更通用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从维基百科表格中提取信息.

I'm trying to extract information from Wikipedia tables.

更具体地说,我正在尝试列出英超联赛中的所有球队和所有球员.

More specifically, I'm trying to make a list of all teams and all players in the premier league.

到目前为止,我能够遍历 2019-2020 年英超联赛球队表中的所有球队,对于我在维基百科页面中获得的每支球队,并遍历其球员获取他们的信息.

Until now I'm able to traverse over the whole teams in the premier league 2019-2020 table of teams, for every team there I get in it Wikipedia page and traverse over its player's getting their information.

我认为维基百科中的所有英超球队都有一个固定的模板,他们的球员名单排在第 3 位,但在遍历 6 支球队后,它遇到了一个排在第 2 位的球队.

I thought there is a fixed template that all premier league teams in Wikipedia have their table of players at position 3 but after traversing 6 teams it faced a team that it's table is in 2nd place.

所以我在每个团队 wiki 页面上使用以下 XPath 查询

So I was using the following XPath query on every team wiki page

"//table[3]/tbody//tr[position() > 1]//td[4]//span/a/@href"

但例如,以下团队球员表位于位置 2,如何使此查询更通用而不将其固定在某个位置?我注意到我所有的相关表格前面都有一个元素,上面写着一线队"

but for example, the following team players table is at position 2, how can I make this query more generic and not fix it a certain position? I have noticed that all of my relevant tables have an element before it with the text "First-team squad"

表格的HTML太长,所以我把某个团队的wiki链接贴在这里

The HTML of the table is too long, so I post here the wiki link of a certain team

https://en.wikipedia.org/wiki/Crystal_Palace_F.C.

希望得到帮助!谢谢.

推荐答案

您必须使用另一个适用于每个页面的锚点".您需要的表格总是在跨度元素玩家"之后的第一个.

You have to use another "anchor" which works for each page. The table you need is always the first after the span element "Players".

这样:

//span[@id='Players']/following::table[1]//span[@class="fn"]//text()

您将获得当前小队所有球员的姓名.

You'll get the names of all players of the current squad team.

有了这个:

//span[@id='Players']/following::table[1]//span[@class="fn"]//@href

您将获得关联的 URL./!\ 有些玩家没有维基百科网页.所以你可以有 26 个玩家名字,但有 25 个网址.喜欢这里:

You'll get the associated URLs. /!\ Some players don't have a wikipedia webpage.So you can have 26 player names but 25 urls. Like here :

https://en.wikipedia.org/wiki/Chelsea_F.C.

这篇关于Xpath 查询,使某个查询更通用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-19 08:17