问题描述
我使用Python和BeautifulSoup 4库处理HTML和我不能找到一个明显的方式来代替&放大器;用空格
; NBSP。相反,它似乎被转换成一个统一code不间断空格字符。
I am processing HTML using Python and the BeautifulSoup 4 library and I can't find an obvious way to replace
with a space. Instead it seems to be converted to a Unicode non-breaking space character.
我失去了一些东西明显?什么是替代的最佳方式&放大器; NBSP;使用BeautifulSoup一个正常的空间吗?
Am I missing something obvious? What is the best way to replace with a normal space using BeautifulSoup?
编辑补充一点,我现在用的是最新版本,BeautifulSoup 4,因此,在美丽的汤3 convertEntities = BeautifulSoup.HTML_ENTITIES
选项不可用。
Edit to add that I am using the latest version, BeautifulSoup 4, so the convertEntities=BeautifulSoup.HTML_ENTITIES
option in Beautiful Soup 3 isn't available.
推荐答案
请参见在文档中,BeautifulSoup 4产生适当的Uni code为所有的实体:
See Entities in the documentation, BeautifulSoup 4 produces proper Unicode for all entities:
这是进入的HTML或XML实体总是被转换为相应的Uni code字符。
是&放大器; NBSP;
转到一个不间断空格字符,如果你真的想这些是空格字符,而不是,你就必须做一个UNI code代替吧。
Yes,
is turned to a non-breaking space character, and if you really want those to be space characters instead, you'll have to do a unicode replace instead.
这篇关于我怎么能替换或删除HTML实体,如"&安培; NBSP;"使用BeautifulSoup 4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!