本文介绍了Python lxml/beautiful soup在网页上查找所有链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在编写一个脚本来读取网页,并建立一个符合特定条件的链接数据库.现在,我陷入了lxml的困境,并了解如何从html抓取所有<a href>
.
I am writing a script to read a web page, and build a database of links that matches a certain criteria. Right now I am stuck with lxml and understanding how to grab all the <a href>
's from the html...
result = self._openurl(self.mainurl)
content = result.read()
html = lxml.html.fromstring(content)
print lxml.html.find_rel_links(html,'href')
推荐答案
使用XPath.诸如此类(无法从此处进行测试):
Use XPath. Something like (can't test from here):
urls = html.xpath('//a/@href')
这篇关于Python lxml/beautiful soup在网页上查找所有链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!