问题描述
使用Beautiful Soup时,"lxml"和"html.parser"之间有什么区别?和"html5lib"?
您何时会使用另一种方法以及每种方法的优点?当我使用它们时,它们似乎是可以互换的,但是这里的人纠正了我,我应该使用另一种.我想加强我的理解;我在这里已经阅读了几篇有关此内容的文章,但它们根本没有涉及太多用途.
When would you use one over the other and the benefits of each? When I used each they seemed to be interchangeable, but people here correct me that I should be using a different one. I'd like to strengthen my understanding; I've read a couple posts on here about this but they're not going over the uses much in any at all.
示例:
soup = BeautifulSoup(response.text, 'lxml')
推荐答案
来自 文档 的优缺点汇总表:
From the docs's summarized table of advantages and disadvantages:
-
html.parser -
BeautifulSoup(标记,"html.parser")
-
优点:包括电池,体面的速度,宽大(自Python 2.7.3和3.2起).
Advantages: Batteries included, Decent speed, Lenient (as of Python 2.7.3 and 3.2.)
缺点:不太宽大(在Python 2.7.3或3.2.2之前)
Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2)
lxml - BeautifulSoup(标记,"lxml")
-
优点:非常快,宽大
Advantages: Very fast, Lenient
缺点:外部C依赖
html5lib - BeautifulSoup(标记,"html5lib")
-
优点:极为宽松,以与网络浏览器相同的方式解析页面,创建有效的HTML5
Advantages: Extremely lenient, Parses pages the same way a web browser does, Creates valid HTML5
缺点:非常慢,外部Python依赖
Disadvantages: Very slow, External Python dependency
这篇关于BeautifulSoup:"lxml","html.parser"和"html5lib"解析器之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!