问题描述
我将Python与Selenium(PhantomJS webdriver)结合使用来解析网站,而我对此有疑问.
I using Python with selenium (PhantomJS webdriver) to parse websites and i have problem with it.
我想从此广播网站获取当前歌曲: http://www.eskago.pl/radio/eska-warszawa .
I want to get current song from this radio site: http://www.eskago.pl/radio/eska-warszawa.
xpath:
/html/body/div[3]/div[1]/section[2]/div/div/div[2]/ul/li[2]/a[2]
xpath不适用于python硒
that xpath does not work with python selenium
错误:
有人知道这有什么问题吗?
Does anyone have idea what is wrong with this?
thx伙计们的答案我终于找到了解决我问题的方法.xpath很好(但实际上很脆弱)
thx guys for answersI finally find a solution for my problem.xpath was good (but in fact fragile)
我使用Firefox驱动程序,发现有问题-广告.
I use firefox driver and i saw problem - ad.
我将不得不跳过它们,因此我决定在没有此广告的情况下使用另一个页面: http://www.eskago.pl/radio
I would have to skip them by that and I decided to use another page without this ad:http://www.eskago.pl/radio
最后,谢谢,我用这个:
and finnaly, thx alecxe - I use this:
driver.find_element_by_xpath('//a[@class="radio-tab-button"]/span/strong').click()
element = driver.find_element_by_xpath('//p[@class="onAirStreamId_999"]/strong')
print element.text
工作完美.
推荐答案
您提供的xpath非常脆弱,现在想知道您是否收到了NoSuchElementException
异常.
The xpath you provided is a very fragile one, now wonder you get a NoSuchElementException
exception.
相反,依靠a
标记的类名,里面有当前正在播放的歌曲:
Instead, rely on the a
tag's class name, there is a current playing song inside:
<a class="playlist_small" href="http://www.eskago.pl/radio/eska-warszawa?noreload=yes">
<img style="width:41px;" src="http://t-eska.cdn.smcloud.net/common/l/Q/s/lQ2009158Xvbl.jpg/ru-0-ra-45,45-n-lQ2009158Xvbl_jessie_j_bang_bang.jpg" alt="">
<strong>Jessie J, Ariana Grande, Nicki Minaj</strong>
<span>Bang Bang</span>
</a>
这是示例代码:
element = driver.find_element_by_xpath('//a[@class="playlist_small"]/strong')
print element.text
另一种检索当前播放歌曲的方法是模仿网站针对播放列表做出的JSONP响应:
Well, another way to retrieve the current playing song - is to mimic the JSONP response the website is making for the playlist:
>>> import requests
>>> import json
>>> import re
>>> response = requests.get('http://static.eska.pl/m/playlist/channel-999.jsonp')
>>> json_data = re.match('jsonp\((.*?)\);', response.content).group(1)
>>> songs = json.loads(json_data)
>>> current_song = songs[0]
>>> [artist['name'] for artist in current_song['artists']]
[u'David Guetta', u'Showtek', u'Vassy']
>>> current_song['name']
u'Bad'
这篇关于xpath不适用于此站点,请验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!