问题描述
有一个全面的字符替换模块python找到字符串中的所有非ascii或非unicode字符,并用ascii或unicode equivilents替换它们?在编码或解码期间,这种忽略参数的舒适度是疯狂的,但是同样地,在非翻译字符的每个地方也是如此。
Is there a comprehensive character replacement module for python that finds all non-ascii or non-unicode characters in a string and replaces them with ascii or unicode equivilents? This comfort with the "ignore" argument during encoding or decoding is insane, but likewise so is a '?' in every place that a non translated character was.
寻找一个模块,找到令人讨厌的字符,并使其符合任何标准要求。
我意识到,现存的字母和编码的数量使这有点不可能,但肯定有人已经刺了它吗?
I'm looking for one module that finds irksome characters and conforms them to whatever standard is requested.I realize that the amount of extant alphabets and encodings makes this somewhat impossible, but surely someone has taken a stab at it? Even a rudimentary solution would be better than the status quo.
这就意味着数据传输的简化是巨大的。
The simplification for data transfer that this would mean is enormous.
推荐答案
我不认为你想要的是真的可能 - 但我认为有一个体面的选择。
i don't think what you want is really possible - but i think there is a decent option.
unicodedata有一个'normalize'方法,可以为你优雅地降低文本...
unicodedata has a 'normalize' method that can gracefully degrade text for you...
import unicodedata
def gracefully_degrade_to_ascii( text ):
return unicodedata.normalize('NFKD',text).encode('ascii','ignore')
假设你使用的字符集已经映射到unicode - 或者至少可以映射到unicode - 你应该能够将该文本的unicode版本降级为ascii或utf-8这个模块(也是标准库的一部分)
assuming the charset you're using is already mapped into unicode - or at least can be mapped into unicode - you should be able to degrade the unicode version of that text down to ascii or utf-8 with this module ( it's part of the standard library too )
完整文档 -
这篇关于全面的字符替换模块在python非unicode和非ascii为HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!