问题描述
在,
以下测试存在:
In http://llvm.org/svn/llvm-project/libcxx/trunk/test/re/re.alg/re.alg.match/ecma.pass.cpp,the following test exists:
std::cmatch m;
const char s[] = "tournament";
assert(!std::regex_match(s, m, std::regex("tour|to|tournament")));
assert(m.size() == 0);
为什么此匹配失败?
在VC ++ 2012和boost,匹配成功。
在Chrome和Firefox的JavaScript上,tournament.match(/ ^(tour | to | tournament) $ /)
成功。
On VC++2012 and boost, the match succeeds.
On Javascript of Chrome and Firefox, "tournament".match(/^(?:tour|to|tournament)$/)
succeeds.
仅在libc ++上,匹配失败。
Only on libc++, the match fails.
推荐答案
我相信测试是正确的。在re.alg下的所有libc ++测试中搜索锦标赛,并比较不同的引擎如何处理 regex(tour | to | tournament)
, regex_search
与 regex_match
不同。
I believe the test is correct. It is instructive to search for "tournament" in all of the libc++ tests under re.alg, and compare how the different engines treat the regex("tour|to|tournament")
, and how regex_search
differs from regex_match
.
让我们从 regex_search
开始:
awk,egrep,extended:
awk, egrep, extended:
regex_search("tournament", m, regex("tour|to|tournament"))
ECMAScript:
ECMAScript:
regex_search("tournament", m, regex("tour|to|tournament"))
grep,basic:
grep, basic:
regex_search("tournament", m, regex("tour|to|tournament"))
awk,egrep和extended将尽可能与交替匹配。然而,ECMAScript交替是有序的。这在中有详细说明。一旦ECMAScript匹配交替中的分支,它将退出搜索。标准包括此示例:
awk, egrep and extended will match as much as they can with alternation. However the ECMAScript alternation is "ordered". This is specified in ECMA-262. Once ECMAScript matches a branch in the alternation, it quits searching. The standard includes this example:
/a|ab/.exec("abc")
< plug>
这在中有详细讨论。没有这本书,我不能实现< regex>
。我会自由地承认,我还不知道正则表达式,而不是我所知道的更多。
This is also discussed in depth in Mastering Regular Expressions by Jeffrey E.F. Friedl. I couldn't have implemented <regex>
without this book. And I will freely admit that there is still much more that I don't know about regular expressions, than what I know.
在交替章节的结尾作者状态:
At the end of the chapter on alternation the author states:
相信!
< / plug>
无论如何,ECMAScript仅匹配tour。只有当整个输入字符串匹配时, regex_match
算法才会返回成功。因为只有输入字符串的前4个字符匹配,所以与awk,egrep和extended不同,ECMAScript返回一个零大小为 cmatch
的false。
Anyway, ECMAScript matches only "tour". The regex_match
algorithm returns success only if the entire input string is matched. Since only the first 4 characters of the input string are matched, then unlike awk, egrep and extended, ECMAScript returns false with a zero-sized cmatch
.
这篇关于在libc ++上,为什么regex_match(“tournament”,regex(“tour | to | tournament”))失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!