本文介绍了如何用其他python替换字符串中的unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串,我从阅读 HTML 网页中得到一个字符串,其中的项目符号由于项目符号列表而带有像•"这样的符号.请注意,该文本是来自使用 Python 2.7 的 urllib2.read(webaddress) 的网页的 HTML 源.

I have a string that I got from reading a HTML webpage with bullets that have a symbol like "•" because of the bulleted list. Note that the text is an HTML source from a webpage using Python 2.7's urllib2.read(webaddress).

我知道项目符号的 unicode 字符为 U+2022,但我如何实际用其他东西替换该 unicode 字符?

I know the unicode character for the bullet character as U+2022, but how do I actually replace that unicode character with something else?

我尝试过str.replace("•", "something")

但它似乎不起作用...我该怎么做?

but it does not appear to work... how do I do this?

推荐答案

  1. 将字符串解码为 Unicode.假设它是 UTF-8 编码的:

  1. Decode the string to Unicode. Assuming it's UTF-8-encoded:

str.decode("utf-8")

  • 调用 replace 方法并确保将 Unicode 字符串作为第一个参数传递给它:

  • Call the replace method and be sure to pass it a Unicode string as its first argument:

    str.decode("utf-8").replace(u"\u2022", "*")
    

  • 如果需要,编码回 UTF-8:

  • Encode back to UTF-8, if needed:

    str.decode("utf-8").replace(u"\u2022", "*").encode("utf-8")
    

  • (幸运的是,Python 3 停止了这种混乱.第 3 步实际上应该只在 I/O 之前执行.另外,请注意,调用字符串 str 会影响内置输入 str.)

    (Fortunately, Python 3 puts a stop to this mess. Step 3 should really only be performed just prior to I/O. Also, mind you that calling a string str shadows the built-in type str.)

    这篇关于如何用其他python替换字符串中的unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

    08-04 11:44
    查看更多