问题描述
我正在尝试从维基百科表格中提取信息.
I'm trying to extract information from Wikipedia tables.
更具体地说,我正在尝试列出英超联赛中的所有球队和所有球员.
More specifically, I'm trying to make a list of all teams and all players in the premier league.
到目前为止,我能够遍历 2019-2020 年英超联赛球队表中的所有球队,对于我在维基百科页面中获得的每支球队,并遍历其球员获取他们的信息.
Until now I'm able to traverse over the whole teams in the premier league 2019-2020 table of teams, for every team there I get in it Wikipedia page and traverse over its player's getting their information.
我认为维基百科中的所有英超球队都有一个固定的模板,他们的球员名单排在第 3 位,但在遍历 6 支球队后,它遇到了一个排在第 2 位的球队.
I thought there is a fixed template that all premier league teams in Wikipedia have their table of players at position 3 but after traversing 6 teams it faced a team that it's table is in 2nd place.
所以我在每个团队 wiki 页面上使用以下 XPath 查询
So I was using the following XPath query on every team wiki page
"//table[3]/tbody//tr[position() > 1]//td[4]//span/a/@href"
但例如,以下团队球员表位于位置 2,如何使此查询更通用而不将其固定在某个位置?我注意到我所有的相关表格前面都有一个元素,上面写着一线队"
but for example, the following team players table is at position 2, how can I make this query more generic and not fix it a certain position? I have noticed that all of my relevant tables have an element before it with the text "First-team squad"
表格的HTML太长,所以我把某个团队的wiki链接贴在这里
The HTML of the table is too long, so I post here the wiki link of a certain team
https://en.wikipedia.org/wiki/Crystal_Palace_F.C.
希望得到帮助!谢谢.
推荐答案
您必须使用另一个适用于每个页面的锚点".您需要的表格总是在跨度元素玩家"之后的第一个.
You have to use another "anchor" which works for each page. The table you need is always the first after the span element "Players".
这样:
//span[@id='Players']/following::table[1]//span[@class="fn"]//text()
您将获得当前小队所有球员的姓名.
You'll get the names of all players of the current squad team.
有了这个:
//span[@id='Players']/following::table[1]//span[@class="fn"]//@href
您将获得关联的 URL./!\ 有些玩家没有维基百科网页.所以你可以有 26 个玩家名字,但有 25 个网址.喜欢这里:
You'll get the associated URLs. /!\ Some players don't have a wikipedia webpage.So you can have 26 player names but 25 urls. Like here :
https://en.wikipedia.org/wiki/Chelsea_F.C.
这篇关于Xpath 查询,使某个查询更通用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!