问题描述
这是一个正则表达式 - 由 egrep 和 Python 2.7 尝试:
Here is a regex - attempted by egrep and then by Python 2.7:
$ echo '/some/path/to/file/abcde.csv' |egrep '*([a-zA-Z]+).csv'
/some/path/to/file/abcde.csv
/some/path/to/file/abcde.csv
然而,Python 中的正则表达式相同:
However, the same regex in Python:
re.match(r'*([a-zA-Z]+)\.csv',f )
给出:
Traceback (most recent call last):
File "/shared/OpenChai/bin/plothost.py", line 26, in <module>
hosts = [re.match(r'*([a-zA-Z]+)\.csv',f ).group(1) for f in infiles]
File "/usr/lib/python2.7/re.py", line 141, in match
return _compile(pattern, flags).match(string)
File "/usr/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
搜索发现这里似乎存在一个 Python 错误:
Doing a search reveals there appears to be a Python bug in play here:
这似乎是一个 python 错误(在 vim 中完美运行).来源问题是 (\s*...)+ 位.
但是,我不清楚:那么上面显示的正则表达式的解决方法是什么 - 让 python 开心?
However, it is not clear to me: what then is the workaround for my regex shown above - to make python happy?
谢谢.
推荐答案
您不需要模式中的 *
,它会导致问题.
You do not need the *
in the pattern, it causes the issue.
使用
([a-zA-Z]+)\.csv
或者匹配整个字符串:
.*([a-zA-Z]+)\.csv
参见演示
原因是 *
是未转义的,因此被视为量词.它应用于正则表达式中的前一个子模式.在这里,它用于模式的开头,因此无法量化任何内容.因此,没有重复被抛出.
The reason is that *
is unescaped and is thus treated as a quantifier. It is applied to the preceding subpattern in the regex. Here, it is used in the beginning of a pattern, and thus cannot quantify nothing. Thus, nothing to repeat is thrown.
如果它在 VIM 中有效",那只是因为 VIM 正则表达式引擎忽略了这个子模式(与 Java 在字符类中使用未转义的 [
和 ]
一样[([)]]
).
If it "works" in VIM, it is just because VIM regex engine ignores this subpattern (same as Java does with unescaped [
and ]
inside a character class like [([)]]
).
这篇关于“没有什么可重复的"来自 Python 正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!