问题描述
我正在尝试将嵌套的{}
括号与Perl中的正则表达式匹配,以便可以从文件中提取某些文本.这是我目前所拥有的:
I'm trying to match nested {}
brackets with a regular expressions in Perl so that I can extract certain pieces of text from a file. This is what I have currently:
my @matches = $str =~ /\{(?:\{.*\}|[^\{])*\}|\w+/sg;
foreach (@matches) {
print "$_\n";
}
在某些时候,它会按预期工作.例如,如果$str = "abc {{xyz} abc} {xyz}"
我获得:
At certain times this works as expected. For instance, if $str = "abc {{xyz} abc} {xyz}"
I obtain:
abc
{{xyz} abc}
{xyz}
符合预期.但是对于其他输入字符串,它不能按预期运行.例如,如果$str = "{abc} {{xyz}} abc"
,则输出为:
as expected. But for other input strings it does not function as expected. For example, if $str = "{abc} {{xyz}} abc"
, the output is:
{abc} {{xyz}}
abc
这不是我期望的.我本来希望{abc}
和{{xyz}}
放在单独的行上,因为每个括号之间都是平衡的.我的正则表达式有问题吗?如果是这样,我将如何解决它?
which is not what I expected. I would have wanted {abc}
and {{xyz}}
to be on separate lines, since each is balanced on its own in terms of brackets. Is there an issue with my regular expression? If so, how would I go about fixing it?
推荐答案
您很惊讶您的模式如何匹配,但是没人能解释吗?这是您的模式匹配的方式:
You were surprised how your pattern matched, but noone explained it? Here's how your pattern is matching:
my @matches = $str =~ /\{(?:\{.*\}|[^{])*\}|\w+/sg;
^ ^ ^ ^ ^ ^
| | | | | |
{ ---------------------+ | | | | |
a --------------------------)-)-)--+ |
b --------------------------)-)-)--+ |
c --------------------------)-)-)--+ |
} --------------------------)-)-)--+ |
--------------------------)-)-)--+ |
{ --------------------------+ | | |
{ ----------------------------+ | |
x ----------------------------+ | |
y ----------------------------+ | |
z ----------------------------+ | |
} ------------------------------+ |
} ----------------------------------------+
如您所见,问题在于/\{.*\}
/匹配太多.里面应该有一个与
As you can see, the problem is that /\{.*\}
/ matches too much. What should be in there is a something that matches
(?: \s* (?: \{ ... \} | \w+ ) )*
...
所在的位置
(?: \s* (?: \{ ... \} | \w+ ) )*
因此,您需要进行一些递归.命名组是执行此操作的一种简便方法.
So you need some recursion. Named groups are an easy way of doing this.
say $1
while /
\G \s*+ ( (?&WORD) | (?&BRACKETED) )
(?(DEFINE)
(?<WORD> \s* \w+ )
(?<BRACKETED> \s* \{ (?&TEXT)? \s* \} )
(?<TEXT> (?: (?&WORD) | (?&BRACKETED) )+ )
)
/xg;
但是,除了重新发明轮子之外,为什么不使用 Text :: Balanced .
But instead of reinventing the wheel, why not use Text::Balanced.
这篇关于Perl正则表达式:匹配嵌套括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!