python - Python:sre_constants.error错误:组不匹配

我知道关于这个问题已经有好几个问题了，但是没有一个能帮我解决问题。。。
当csv文档中的名称跟随标记{SPEAKER}或{GROUP OF SPEAKERS}时，我必须替换它们。但是，我收到以下错误消息：

File "/usr/lib/python2.7/re.py", line 291, in filter
    return sre_parse.expand_template(template, match)
  File "/usr/lib/python2.7/sre_parse.py", line 831, in expand_template
    raise error, "unmatched group"
sre_constants.error: unmatched group

我的剧本是：

list_speakers = re.compile(r'^\{GROUP OF SPEAKERS\}\t(.*)|^\{SPEAKER\}\t(.*)')

usernames = set()
for f in corpus:
    with open(f, "r", encoding=encoding) as fin:
        line = fin.readline()
        while line:
            line = line.rstrip()
            if not line:
                line = fin.readline()
                continue

            if not list_speakers.match(line):
                line = fin.readline()
                continue

            names = list_speakers.sub(r'\1', line)
            names = names.split(", ")
            for name in names:
                usernames.add(name)

            line = fin.readline()

最佳答案

issue is a known one：如果组未初始化，则在3.5之前的Python版本中，backreference不会设置为空字符串。
您需要确保只有一个或使用lambda表达式作为替换参数来实现自定义替换逻辑。
在这里，您可以使用一个捕获组轻松地将regex修改为一个模式：

r'^\{(?:GROUP OF SPEAKERS|SPEAKER)\}\t(.*)'

查看regex demo
细节
^-字符串开始
\{-a{
(?:GROUP OF SPEAKERS|SPEAKER)-匹配GROUP OF SPEAKERS或SPEAKER的非捕获组
\}-a}（您也可以编写}，它不需要转义）
\t-制表符
(.*)-组1:除换行符以外的任何0+字符，尽可能多（行的其余部分）。