本文介绍了我怎么能替换或删除HTML实体,如"&安培; NBSP;"使用BeautifulSoup 4的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Python和BeautifulSoup 4库处理HTML和我不能找到一个明显的方式来代替&放大器;用空格; NBSP。相反,它似乎被转换成一个统一code不间断空格字符。

I am processing HTML using Python and the BeautifulSoup 4 library and I can't find an obvious way to replace   with a space. Instead it seems to be converted to a Unicode non-breaking space character.

我失去了一些东西明显?什么是替代的最佳方式&放大器; NBSP;使用BeautifulSoup一个正常的空间吗?

Am I missing something obvious? What is the best way to replace   with a normal space using BeautifulSoup?

编辑补充一点,我现在用的是最新版本,BeautifulSoup 4,因此,在美丽的汤3 convertEntities = BeautifulSoup.HTML_ENTITIES 选项不可用。

Edit to add that I am using the latest version, BeautifulSoup 4, so the convertEntities=BeautifulSoup.HTML_ENTITIES option in Beautiful Soup 3 isn't available.

推荐答案

请参见在文档中,BeautifulSoup 4产生适当的Uni code为所有的实体:

See Entities in the documentation, BeautifulSoup 4 produces proper Unicode for all entities:

这是进入的HTML或XML实体总是被转换为相应的Uni code字符。

&放大器; NBSP; 转到一个不间断空格字符,如果你真的想这些是空格字符,而不是,你就必须做一个UNI code代替吧。

Yes,   is turned to a non-breaking space character, and if you really want those to be space characters instead, you'll have to do a unicode replace instead.

这篇关于我怎么能替换或删除HTML实体,如"&安培; NBSP;"使用BeautifulSoup 4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-21 14:16