用一个空格替换非ASCII字符 | 用一个空格替换非ASCII字符

本文介绍了用一个空格替换非ASCII字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要用空格替换所有非ASCII（\x00-\x7F）字符。我很惊讶，Python并不是很容易，除非我错过了一些东西。以下函数只是删除所有非ASCII字符：

I need to replace all non-ASCII (\x00-\x7F) characters with a space. I'm surprised that this is not dead-easy in Python, unless I'm missing something. The following function simply removes all non-ASCII characters:

def remove_non_ascii_1(text):

    return ''.join(i for i in text if ord(i)<128)

而这个一个字符替换为非ASCII字符与字符代码点中的字节量（即 - 字符替换为3个空格）的空格量：

And this one replaces non-ASCII characters with the amount of spaces as per the amount of bytes in the character code point (i.e. the – character is replaced with 3 spaces):

def remove_non_ascii_2(text):

    return re.sub(r'[^\x00-\x7F]',' ', text)

如何替换所有非ASCII字符有一个空格？

，，另外解决不是特定字符的所有非ASCII字符。

Of the myriad of similar SO questions, none address character replacement as opposed to stripping, and additionally address all non-ascii characters not a specific character.

推荐答案

您的''。join（）表达式是过滤任何非ASCII;您可以使用条件表达式：

Your ''.join() expression is filtering, removing anything non-ASCII; you could use a conditional expression instead:

return ''.join([i if ord(i) < 128 else ' ' for i in text])

这个处理字符一个一个，仍然会使用每个字符一个空格替换为

This handles characters one by one and would still use one space per character replaced.

您的正则表达式应该只用空格替换连续的非ASCII字符：

Your regular expression should just replace consecutive non-ASCII characters with a space:

re.sub(r'[^\x00-\x7F]+',' ', text)

请注意 + 。

这篇关于用一个空格替换非ASCII字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！