如何从 Python 中的字符串中删除 xa0?

本文介绍了如何从 Python 中的字符串中删除 xa0?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用 Beautiful Soup 来解析 HTML 文件并调用 get_text()，但似乎我留下了很多 xa0 Unicode 表示空格.有没有一种有效的方法可以在 Python 2.7 中删除所有这些，并将它们更改为空格?我想更普遍的问题是，有没有办法删除 Unicode 格式?

I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of xa0 Unicode representing spaces. Is there an efficient way to remove all of them in Python 2.7, and change them into spaces? I guess the more generalized question would be, is there a way to remove Unicode formatting?

我尝试使用:line = line.replace(u'xa0',' ')，正如另一个线程所建议的那样，但这将 xa0's 更改为 u's，所以现在我有了你到处都是.):

I tried using: line = line.replace(u'xa0',' '), as suggested by another thread, but that changed the xa0's to u's, so now I have "u"s everywhere instead. ):

问题似乎通过 str.replace(u'xa0', ' ').encode('utf-8') 解决，但只是执行 .encode('utf-8') 没有 replace() 似乎导致它吐出更奇怪的字符，例如 xc2 .谁能解释一下?

The problem seems to be resolved by str.replace(u'xa0', ' ').encode('utf-8'), but just doing .encode('utf-8') without replace() seems to cause it to spit out even weirder characters, xc2 for instance. Can anyone explain this?

xa0

如何从 Python 中的字符串中删除 xa0?

问题描述

推荐答案