corpus = """In the US 555-0198 and 1-206-5705-0100 are examples fictitious numbers.
In the UK, 044-113-496-1834 is a fictitious number.
In Ireland, the number 353-020-917-1234 is fictitious.
And in Australia, 061-970-654-321 is a fictitious number.
311 is a joke."""
我是python的新手,正在研究正则表达式,尝试将所有7,11,12和13位数字更改为零。我希望它仍然看起来像一个电话号码。例如将555-0198更改为000-0000,是否有一种方法可以使311保持原样而不变为零?以下是我能想到的
起初我尝试过,但是它使所有数字为零
for word in corpus.split():
nums = re.sub("(\d)", "0",word)
print(nums)
然后我尝试了一下,但是我意识到用这种方式对11位和13位数字不正确
def sub_nums():
for word in corpus.split():
nums = re.sub("(\d{1,4})-+(\d{1,4})", "000-0000",word)
print(nums)
sub_nums()
最佳答案
我使用的正则表达式是:
r'(?<!\S)(?:(?=(-*\d-*){7}(\s|\Z))[\d-]+|(?=(-*\d-*){11}(\s|\Z))[\d-]+|(?=(-*\d-*){12}(\s|\Z))[\d-]+|(?=(-*\d-*){13}(\s|\Z))[\d-]+)'
7位,11位,12位和13位电话号码有重复的“主题”或模式,因此,我将仅解释7位数字的电话号码的模式:
(?!\S)
这是一种否定性的含义,适用于所有模式,并表示电话号码不得以非空格字符开头。这是一个双重否定,几乎等同于说电话号码必须以空格开头,但允许电话号码以字符串开头开头。另一种选择是在(?=\s|\A)
之后使用等价的正向查找,它表示电话号码必须以空格或字符串开头开头。但是,这是一个可变长度的回溯,Python随附的正则表达式引擎不支持该变量(但PyPi存储库的regex
包支持)。(?=(-*\d-*){7}(\s|\Z))
7位电话号码的超前要求要求,下一个字符必须由数字和连字符组成,后跟空格或字符串结尾,并且数字位数必须恰好为7。[\d-]+
这将实际匹配输入中的下一位数字和连字符。See Regex Demo
import re
corpus = """In the US 555-0198 and 1-206-5705-0100 are examples fictitious numbers.
In the UK, 044-113-496-1834 is a fictitious number.
In Ireland, the number 353-020-917-1234 is fictitious.
And in Australia, 061-970-654-321 is a fictitious number.
311 is a joke."""
regex = r'(?<!\S)(?:(?=(-*\d-*){7}(\s|\Z))[\d-]+|(?=(-*\d-*){11}(\s|\Z))[\d-]+|(?=(-*\d-*){12}(\s|\Z))[\d-]+|(?=(-*\d-*){13}(\s|\Z))[\d-]+)'
new_corpus = re.sub(regex, lambda m: re.sub(r'\d', '0', m[0]), corpus)
print(new_corpus)
印刷品:
In the US 000-0000 and 0-000-0000-0000 are examples fictitious numbers.
In the UK, 000-000-000-0000 is a fictitious number.
In Ireland, the number 000-000-000-0000 is fictitious.
And in Australia, 000-000-000-000 is a fictitious number.
311 is a joke.
关于python - 用python替换不同长度的数字(re.sub),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59298371/