本文介绍了python UnicodeEncodeError>如何简单地删除令人烦恼的unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的工作.

>>> soup = BeautifulSoup (html)
>>> soup
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 96953: ordinal not in range(128)
>>> 
>>> soup.find('div')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 11035: ordinal not in range(128)
>>> 
>>> soup.find('span')
<span id="navLogoPrimary" class="navSprite"><span>amazon.com</span></span>
>>> 

如何简单地从html中删除令人烦恼的unicode字符?
还是有更清洁的解决方案?

How can I simply remove troubling unicode characters from html ?
Or is there any cleaner solution ?

推荐答案

尝试这种方式:soup = BeautifulSoup (html.decode('utf-8', 'ignore'))

这篇关于python UnicodeEncodeError&gt;如何简单地删除令人烦恼的unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 10:54