用一个空格替换非ASCII字符

用一个空格替换非ASCII字符

本文介绍了用一个空格替换非ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要用空格替换所有非ASCII(\x00-\x7F)字符。我很惊讶,Python并不是很容易,除非我错过了一些东西。以下函数只是删除所有非ASCII字符:

I need to replace all non-ASCII (\x00-\x7F) characters with a space. I'm surprised that this is not dead-easy in Python, unless I'm missing something. The following function simply removes all non-ASCII characters:

def remove_non_ascii_1(text):

    return ''.join(i for i in text if ord(i)<128)

而这个一个字符替换为非ASCII字符与字符代码点中的字节量(即 - 字符替换为3个空格)的空格量:

And this one replaces non-ASCII characters with the amount of spaces as per the amount of bytes in the character code point (i.e. the character is replaced with 3 spaces):

def remove_non_ascii_2(text):

    return re.sub(r'[^\x00-\x7F]',' ', text)

如何替换所有非ASCII字符有一个空格?

, ,另外解决不是特定字符的所有非ASCII字符。

Of the myriad of similar SO questions, none address character replacement as opposed to stripping, and additionally address all non-ascii characters not a specific character.

推荐答案

您的''。join()表达式是过滤任何非ASCII;您可以使用条件表达式:

Your ''.join() expression is filtering, removing anything non-ASCII; you could use a conditional expression instead:

return ''.join([i if ord(i) < 128 else ' ' for i in text])

这个处理字符一个一个,仍然会使用每个字符一个空格替换为

This handles characters one by one and would still use one space per character replaced.

您的正则表达式应该只用空格替换连续的非ASCII字符:

Your regular expression should just replace consecutive non-ASCII characters with a space:

re.sub(r'[^\x00-\x7F]+',' ', text)

请注意 +

这篇关于用一个空格替换非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:56