python - Python正则表达式，用于拆分两年出现的提及

我遇到以下情况，在我的字符串中，我想将格式“（（19561958）”）的格式错误地格式化为“（（1956-1958））”。我尝试过的正则表达式是：

import re
a = "(19561958)"
re.sub(r"(\d\d\d\d\d\d\d\d)", r"\1-", a)

但这返回我“（19561958-）”。我怎样才能达到目的？非常感谢！

最佳答案

您可以分别捕获这两年，并在两组之间插入连字符：

>>> import re
>>> re.sub(r'(\d{4})(\d{4})', r'\1-\2', '(19561958)')
'(1956-1958)'

请注意，\d\d\d\d更简洁地写为\d{4}。

如当前所写，这将在任何八位数加数字的前两个四组之间插入连字符。如果需要为匹配添加括号，则可以在环顾四周中明确包含它们：

>>> re.sub(r'''
    (?<=\() # make sure there's an opening parenthesis prior to the groups
    (\d{4}) # one group of four digits
    (\d{4}) # and a second group of four digits
    (?=\))  # with a closing parenthesis after the two groups
''', r'\1-\2', '(19561958)', flags=re.VERBOSE)
'(1956-1958)'

另外，您也可以使用字边界，例如，八位数字周围的空格：

>>> re.sub(r'\b(\d{4})(\d{4})\b', r'\1-\2', '(19561958)')
'(1956-1958)'

关于python - Python正则表达式，用于拆分两年出现的提及，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/28396120/