如何用其他python替换字符串中的unicode字符?

本文介绍了如何用其他python替换字符串中的unicode字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个字符串，我从阅读 HTML 网页中得到一个字符串，其中的项目符号由于项目符号列表而带有像•"这样的符号.请注意，该文本是来自使用 Python 2.7 的 urllib2.read(webaddress) 的网页的 HTML 源.

I have a string that I got from reading a HTML webpage with bullets that have a symbol like "•" because of the bulleted list. Note that the text is an HTML source from a webpage using Python 2.7's urllib2.read(webaddress).

我知道项目符号的 unicode 字符为 U+2022，但我如何实际用其他东西替换该 unicode 字符?

I know the unicode character for the bullet character as U+2022, but how do I actually replace that unicode character with something else?

我尝试过str.replace("•", "something")

但它似乎不起作用...我该怎么做?

but it does not appear to work... how do I do this?

推荐答案

将字符串解码为 Unicode.假设它是 UTF-8 编码的:

Decode the string to Unicode. Assuming it's UTF-8-encoded:

str.decode("utf-8")

调用 replace 方法并确保将 Unicode 字符串作为第一个参数传递给它:

Call the replace method and be sure to pass it a Unicode string as its first argument:

str.decode("utf-8").replace(u"\u2022", "*")

如果需要，编码回 UTF-8:

Encode back to UTF-8, if needed:

str.decode("utf-8").replace(u"\u2022", "*").encode("utf-8")

(幸运的是，Python 3 停止了这种混乱.第 3 步实际上应该只在 I/O 之前执行.另外，请注意，调用字符串 str 会影响内置输入 str.)

(Fortunately, Python 3 puts a stop to this mess. Step 3 should really only be performed just prior to I/O. Also, mind you that calling a string str shadows the built-in type str.)

这篇关于如何用其他python替换字符串中的unicode字符?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！