本文介绍了ANTLRv4:非贪婪规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读明确的ANTLR4参考,并对示例之一有疑问(p.76):

I'm reading the definite ANTLR4 reference and have a question regarding one of the examples (p. 76):

STRING: '"' (ESC|.)*? '"';
fragment
ESC: '\\"' | '\\\\' ;

该规则与典型的C ++字符串匹配-""中包含的char序列,也可以包含\".

The rule matches a typical C++ string - a char sequence included in "", which can contain \" too.

在我的期望中,由于非贪心的构造,规则STRING应该与可能的最小字符串匹配.因此,如果看到\",它将在规则末尾将\映射到.,将"映射到",因为这将导致最小的字符串.取而代之的是,将\"映射到ESC.我有一个理解上的问题,因为这不是我所期望的.

In my expectation, the rule STRING should match the smallest string possible because of the non-greedy construct. So if it sees a \" it would map \ to . and " to " at the end of the rule, since this would result in the smallest string possible. Instead of this, a \" is mapped to ESC. I have an understanding problem, since it is not what I expected.

这里到底发生了什么?是这样吗,一个分开的DFA首先匹配(ESC|.),另一个DFA使用已经匹配的(ESC|.)构造字符串匹配STRING?我不得不承认我还没有读完这本书.

What exactly happens here? Is it like this, that a separated DFA matches (ESC|.) first, and another DFA matches STRING using the already matched string of the (ESC|.) construct? I have to admit I haven't read the book to the end.

推荐答案

ANTLR 4词法分析器通常以最长匹配获胜的方式运行,而不考虑语法中替代词的出现顺序.如果两个词法分析器 rules 匹配相同的最长输入序列,则只有比较这些规则的相对顺序才能确定令牌类型的分配方式.

ANTLR 4 lexers normally operate with longest-match-wins behavior, without any regard for the order in which alternatives appear in the grammar. If two lexer rules match the same longest input sequence, only then is the relative order of those rules compared to determine how the token type is assigned.

一旦词法分析器到达非贪婪的可选内容或闭包,规则内的行为就会发生变化.从那一刻开始到规则的结尾,该规则内的所有替代项将被视作有序处理,具有最低替代项的路径将获胜.由于我们在底层ATN表示中订购替代商品的方式.当词法分析器处于此模式并到达块(ESC|.)时,排序约束要求它尽可能使用ESC.

The behavior within a rule changes as soon as the lexer reaches a non-greedy optional or closure. From that moment forward to the end of the rule, all alternatives within that rule will be treated as ordered, and the path with the lowest alternative wins. This seemingly strange behavior is actually responsible for the non-greedy handling due to the way we order alternatives in the underlying ATN representation. When the lexer is in this mode and reaches the block (ESC|.), the ordering constraint requires it use ESC if possible.

这篇关于ANTLRv4:非贪婪规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 17:23