sed正则表达式反向引用

sed正则表达式反向引用

本文介绍了搜索模式中的Vim / sed正则表达式反向引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Vim帮助说:

\1      Matches the same string that was matched by     */\1* *E65*
        the first sub-expression in \( and \). {not in Vi}
        Example: "\([a-z]\).\1" matches "ata", "ehe", "tot", etc.


看起来 backreference 可以用于搜索模式。我开始玩它,发现无法解释的行为。这是我的文件:

It looks like the backreference can be used in search pattern. I started playing with it and I noticed behavior that I can't explain. This is my file:

<paper-input label="Input label"> Some text </paper-input>
<paper-input label="Input label"> Some text </paper-inputa>
<aza> Some text </az>
<az> Some text </az>
<az> Some text </aza>

我想匹配开始和结束标记匹配的行,即:

I wanted to match the lines where the opening and closing tags are matching i.e.:

<paper-input label="Input label"> Some text </paper-input>
<az> Some text </az>

我的测试正则表达式为:

And my test regex is:

%s,<\([^ >]\+\).*<\/\1>,,gn

但这匹配以下行: 1 3 4 。与 sed 相同:

But this matches lines: 1, 3 and 4. Same thing with sed:

$ sed -ne 's,<\([^ >]\+\).*<\/\1>,\0,p' file
<paper-input label="Input label"> Some text </paper-input>
<aza> Some text </az>
<az> Some text </az>

此:< \([^>] \ + \)应该很贪心,并且在最后不带 \1 进行匹配时,所有组都是正确的。但是当我加上 \1 时,似乎< \([^>] \ + \)变得不贪婪,并尝试在第三行中强制匹配。有人可以解释为什么它匹配第三行:

This: <\([^ >]\+\) should be greedy and when trying to match it without \1 at the end then all the groups are correct. But when I add \1 it seems that <\([^ >]\+\) becomes not greedy and it tries to force the match in 3rd line. Can someone explain why it matches 3rd line:

<aza> Some text </az>

这也是

注意
这与正则表达式本身无关(

NOTEThis is not about the regex itself (probably there is other way to do it) but about the behavior of that regex.

推荐答案

了解正则表达式的行为方式它需要您了解回溯正则表达式引擎的作用。

To understand why your regex behaves the way it does you need to understand what a backtracking regex engine does.

引擎会贪婪地匹配并消耗尽可能多的字符。但是,如果找不到匹配项,它会返回并尝试找到仍满足该模式的其他匹配项。

The engine will greedily match and consume as many characters as it can. But if it doesn't find a match it goes back and tries to find a different match that still satisfies the pattern.

%s,<\([^ >]\+\).*<\/\1>,,gn

对于第三行< aza>某些文本< / az>

正则表达式引擎查看 \1 = aza 。并查看。*< / aza> 是否与字符串的其余部分匹配。并非如此,它为 \1 选择了其他内容。下次选择 \1 = az 并查看。*< / az> 是否与其余项匹配字符串,确实如此。因此字符串匹配

The regex engine looks at \1 = aza. and sees if .*</aza> matches the rest of the string. It doesn't so it chooses something else for \1. The next time it chooses \1 = az and sees if .*</az> matches the rest of the string and it does. So the string matches

(这是一个简化的版本。我跳过了。* 可以做的事实

(This is a simplified version. I skipped over the fact that .* can potentially do a lot of backtracking itself)

解决它就像在regex中添加锚一样容易,阻止regex搜索其他可以满足 \1 的值。在这种情况下,匹配空格或> 就足够了。

Solving it is as easy as adding an anchor in the regex stops the regex from searching for other values that could satisfy \1. In this case matching a space or > is sufficient.

这篇关于搜索模式中的Vim / sed正则表达式反向引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 22:24