问题描述
我需要用空格替换所有非ASCII(\x00-\x7F)字符。我很惊讶,Python并不是很容易,除非我错过了一些东西。以下函数只是删除所有非ASCII字符:
I need to replace all non-ASCII (\x00-\x7F) characters with a space. I'm surprised that this is not dead-easy in Python, unless I'm missing something. The following function simply removes all non-ASCII characters:
def remove_non_ascii_1(text):
return ''.join(i for i in text if ord(i)<128)
而这个一个字符替换为非ASCII字符与字符代码点中的字节量(即 -
字符替换为3个空格)的空格量:
And this one replaces non-ASCII characters with the amount of spaces as per the amount of bytes in the character code point (i.e. the –
character is replaced with 3 spaces):
def remove_non_ascii_2(text):
return re.sub(r'[^\x00-\x7F]',' ', text)
如何替换所有非ASCII字符有一个空格?
, ,另外解决不是特定字符的所有非ASCII字符。
Of the myriad of similar SO questions, none address character replacement as opposed to stripping, and additionally address all non-ascii characters not a specific character.
推荐答案
您的''。join()
表达式是过滤任何非ASCII;您可以使用条件表达式:
Your ''.join()
expression is filtering, removing anything non-ASCII; you could use a conditional expression instead:
return ''.join([i if ord(i) < 128 else ' ' for i in text])
这个处理字符一个一个,仍然会使用每个字符一个空格替换为
This handles characters one by one and would still use one space per character replaced.
您的正则表达式应该只用空格替换连续的非ASCII字符:
Your regular expression should just replace consecutive non-ASCII characters with a space:
re.sub(r'[^\x00-\x7F]+',' ', text)
请注意 +
。
这篇关于用一个空格替换非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!