问题描述
我有一个字符串,我从阅读 HTML 网页中得到一个字符串,其中的项目符号由于项目符号列表而带有像•"这样的符号.请注意,该文本是来自使用 Python 2.7 的 urllib2.read(webaddress)
的网页的 HTML 源.
I have a string that I got from reading a HTML webpage with bullets that have a symbol like "•" because of the bulleted list. Note that the text is an HTML source from a webpage using Python 2.7's urllib2.read(webaddress)
.
我知道项目符号的 unicode 字符为 U+2022
,但我如何实际用其他东西替换该 unicode 字符?
I know the unicode character for the bullet character as U+2022
, but how do I actually replace that unicode character with something else?
我尝试过str.replace("•", "something")
但它似乎不起作用...我该怎么做?
but it does not appear to work... how do I do this?
推荐答案
将字符串解码为 Unicode.假设它是 UTF-8 编码的:
Decode the string to Unicode. Assuming it's UTF-8-encoded:
str.decode("utf-8")
调用 replace
方法并确保将 Unicode 字符串作为第一个参数传递给它:
Call the replace
method and be sure to pass it a Unicode string as its first argument:
str.decode("utf-8").replace(u"\u2022", "*")
如果需要,编码回 UTF-8:
Encode back to UTF-8, if needed:
str.decode("utf-8").replace(u"\u2022", "*").encode("utf-8")
(幸运的是,Python 3 停止了这种混乱.第 3 步实际上应该只在 I/O 之前执行.另外,请注意,调用字符串 str
会影响内置输入 str
.)
(Fortunately, Python 3 puts a stop to this mess. Step 3 should really only be performed just prior to I/O. Also, mind you that calling a string str
shadows the built-in type str
.)
这篇关于如何用其他python替换字符串中的unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!