问题描述
因此,我一直在尝试弄清楚我们如何使用BeautifulSoup,并进行了快速搜索,发现lxml可以解析html页面的xpath.如果可以的话,我会喜欢的,但是本教程并不那么直观.
So I have been trying to figure our how to use BeautifulSoup and did a quick search and found lxml can parse the xpath of an html page. I would LOVE if I could do that but the tutorial isnt that intuitive.
我知道如何使用Firebug来获取xpath,并且很好奇是否有人使用lxml,并且可以解释我如何使用它来解析特定的xpath,并打印它们.说每行5条. ?!
I know how to use Firebug to grab the xpath and was curious if anyone has use lxml and can explain how I can use it to parse specific xpath's, and print them.. say 5 per line..or if it's even possible?!
Selenium正在使用Chrome并正确加载页面,只需要前进的帮助即可.
Selenium is using Chrome and loads the page properly, just need help moving forward.
谢谢!
推荐答案
lxml
的ElementTree具有.xpath()方法(请注意,Python分发主体中xml
包中的ElementTree具有该方法!)
lxml
's ElementTree has a .xpath() method (note that the ElementTree in the xml
package in the Python distribution dosent have that!)
例如
# see http://lxml.de/xpathxslt.html
from lxml import etree
# root = etree.parse('/tmp/stack-overflow-questions.xml')
root = etree.XML('''
<answers>
<answer author="dlam" question-id="13965403">AAA</answer>
</answers>
''')
all_answers = root.xpath('.//answer')
for i, answer in enumerate(all_answers):
who_answered = answer.attrib['author']
question_id = answer.attrib['question-id']
answer_text = answer.text
print 'Answer #{0} by {1}: {2}'.format(i, who_answered, answer_text)
这篇关于我可以使用python,selenium和lxml解析xpath吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!