“没有什么可重复的"来自 Python 正则表达式

本文介绍了“没有什么可重复的"来自 Python 正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是一个正则表达式 - 由 egrep 和 Python 2.7 尝试:

Here is a regex - attempted by egrep and then by Python 2.7:

$ echo '/some/path/to/file/abcde.csv' |egrep '*([a-zA-Z]+).csv'

/some/path/to/file/abcde.csv

然而，Python 中的正则表达式相同:

However, the same regex in Python:

re.match(r'*([a-zA-Z]+)\.csv',f )

给出:

Traceback (most recent call last):
  File "/shared/OpenChai/bin/plothost.py", line 26, in <module>
    hosts = [re.match(r'*([a-zA-Z]+)\.csv',f ).group(1) for f in infiles]
  File "/usr/lib/python2.7/re.py", line 141, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat

搜索发现这里似乎存在一个 Python 错误:

Doing a search reveals there appears to be a Python bug in play here:

正则表达式错误 - 无需重复

这似乎是一个 python 错误(在 vim 中完美运行).来源问题是 (\s*...)+ 位.

但是，我不清楚:那么上面显示的正则表达式的解决方法是什么 - 让 python 开心?

However, it is not clear to me: what then is the workaround for my regex shown above - to make python happy?

谢谢.

推荐答案

您不需要模式中的 *，它会导致问题.

You do not need the * in the pattern, it causes the issue.

使用

([a-zA-Z]+)\.csv

或者匹配整个字符串:

.*([a-zA-Z]+)\.csv

参见演示

原因是 * 是未转义的，因此被视为量词.它应用于正则表达式中的前一个子模式.在这里，它用于模式的开头，因此无法量化任何内容.因此，没有重复被抛出.

The reason is that * is unescaped and is thus treated as a quantifier. It is applied to the preceding subpattern in the regex. Here, it is used in the beginning of a pattern, and thus cannot quantify nothing. Thus, nothing to repeat is thrown.

如果它在 VIM 中有效"，那只是因为 VIM 正则表达式引擎忽略了这个子模式(与 Java 在字符类中使用未转义的 [ 和 ] 一样[([)]]).

If it "works" in VIM, it is just because VIM regex engine ignores this subpattern (same as Java does with unescaped [ and ] inside a character class like [([)]]).

这篇关于“没有什么可重复的"来自 Python 正则表达式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！