本文介绍了我可以使用python,selenium和lxml解析xpath吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我一直在尝试弄清楚我们如何使用BeautifulSoup,并进行了快速搜索,发现lxml可以解析html页面的xpath.如果可以的话,我会喜欢的,但是本教程并不那么直观.

So I have been trying to figure our how to use BeautifulSoup and did a quick search and found lxml can parse the xpath of an html page. I would LOVE if I could do that but the tutorial isnt that intuitive.

我知道如何使用Firebug来获取xpath,并且很好奇是否有人使用lxml,并且可以解释我如何使用它来解析特定的xpath,并打印它们.说每行5条. ?!

I know how to use Firebug to grab the xpath and was curious if anyone has use lxml and can explain how I can use it to parse specific xpath's, and print them.. say 5 per line..or if it's even possible?!

Selenium正在使用Chrome并正确加载页面,只需要前进的帮助即可.

Selenium is using Chrome and loads the page properly, just need help moving forward.

谢谢!

推荐答案

lxml的ElementTree具有.xpath()方法(请注意,Python分发主体中xml包中的ElementTree具有该方法!)

lxml's ElementTree has a .xpath() method (note that the ElementTree in the xml package in the Python distribution dosent have that!)

例如

# see http://lxml.de/xpathxslt.html

from lxml import etree

# root = etree.parse('/tmp/stack-overflow-questions.xml')
root = etree.XML('''
        <answers>
            <answer author="dlam" question-id="13965403">AAA</answer>
        </answers>
''')

all_answers = root.xpath('.//answer')

for i, answer in enumerate(all_answers):
    who_answered = answer.attrib['author']
    question_id = answer.attrib['question-id']
    answer_text = answer.text
    print 'Answer #{0} by {1}: {2}'.format(i, who_answered, answer_text)

这篇关于我可以使用python,selenium和lxml解析xpath吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 16:55