

我见过许多使用空格处理的 ANTLR 语法,如下所示:

I have seen many ANTLR grammars that use whitespace handling like this:

WS: [ \n\t\r]+ -> skip;
// or
WS: [ \n\t\r]+ -> channel(HIDDEN);


So the whitespaces are thrown away respectively send to the hidden channel.


grammar Not;

start:      expression;
expression: NOT expression
          | (TRUE | FALSE);

NOT:    'not';
TRUE:   'true';
FALSE:  'false';
WS: [ \n\t\r]+ -> skip;

有效输入是not true"或not false",但也是nottrue",这不是预期的结果.将语法更改为:

valid inputs are 'not true' or 'not false' but also 'nottrue' which is not a desired result.Changing the grammar to:

grammar Not;

start:      expression;

expression: NOT WS+ expression
          | (TRUE | FALSE);

NOT:    'not';

TRUE:   'true';
FALSE:  'false';

WS: [ \n\t\r];


fixes the problem, but i do not want to handle the whitespaces manually in each rule.

通常我希望在每个标记之间有一个空格,但有一些例外(例如,'!true' 之间不需要空格).

Generally i want to have a whitespace between each token with some exceptions (e.g. '!true' does not need a whitespace in between).


Is there a simple way of doing this?


添加 IDENTIFIER 词法分析器规则来处理不是关键字的词.

Add an IDENTIFIER lexer rule to handle words which are not keywords.


现在文本 nottrue 是单个 IDENTIFIER 标记,您的解析器不会接受它来代替 not true 中的不同关键字.

Now the text nottrue is a single IDENTIFIER token which your parser would not accept in place of the distinct keywords in not true.

确保 IDENTIFIER 定义在您的其他关键字之后.词法分析器会发现 NOTIDENTIFIER 都与文本 not 匹配,并将标记类型分配给出现在语法中的第一个.

Make sure IDENTIFIER is defined after your other keywords. The lexer will find that both NOT and IDENTIFIER match the text not, and will assign the token type to the first one that appears in the grammar.


08-03 17:54