Python lxml/beautiful soup在网页上查找所有链接

本文介绍了Python lxml/beautiful soup在网页上查找所有链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个脚本来读取网页，并建立一个符合特定条件的链接数据库.现在，我陷入了lxml的困境，并了解如何从html抓取所有<a href>.

I am writing a script to read a web page, and build a database of links that matches a certain criteria. Right now I am stuck with lxml and understanding how to grab all the <a href>'s from the html...

result = self._openurl(self.mainurl)
content = result.read()
html = lxml.html.fromstring(content)
print lxml.html.find_rel_links(html,'href')

推荐答案

使用XPath.诸如此类(无法从此处进行测试):

Use XPath. Something like (can't test from here):

urls = html.xpath('//a/@href')

这篇关于Python lxml/beautiful soup在网页上查找所有链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

Beautiful