Antlr4:如何在语法中隐藏和使用令牌

本文介绍了Antlr4:如何在语法中隐藏和使用令牌的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在解析一种脚本语言，它定义了两种类型的语句；控制语句和非控制语句.非控制语句总是以 ';' 结束，而控制语句可能以 ';' 或 EOL ('\n') 结束.部分语法如下所示:

I'm parsing a script language that defines two types of statements; control statements and non control statements. Non control statements are always ended with ';', while control statements may end with ';' or EOL ('\n'). A part of the grammar looks like this:

script
    :   statement* EOF
    ;

statement
    :   control_statement
    |   no_control_statement
    ;

control_statement
    :   if_then_control_statement
    ;

if_then_control_statement
    :   IF expression THEN end_control_statment
        ( statement ) *
        ( ELSEIF expression THEN end_control_statment ( statement )* )*
        ( ELSE end_control_statment ( statement )* )?
        END IF end_control_statment
    ;

no_control_statement
    :   sleep_statement
    ;

sleep_statement
    :   SLEEP expression END_STATEMENT
    ;

end_control_statment
    :   END_STATEMENT
    |   EOL
    ;

END_STATEMENT
    :   ';'
    ;

ANY_SPACE
    :   ( LINE_SPACE | EOL )    ->  channel(HIDDEN)
    ;

EOL
    :   [\n\r]+
    ;

LINE_SPACE
    :   [ \t]+
    ;

在脚本语言的所有其他方面，我从不关心 EOL 所以我使用普通的词法分析器规则来隐藏空格.

In all other aspects of the script language, I never care about EOL so I use the normal lexer rules to hide white space.

这在所有情况下都可以正常工作，但在我需要使用 EOL 来查找控制语句的终止的情况下，但使用上面的语法，所有 EOL隐藏，不在控制语句规则中使用.

This works fine in all cases but the cases where I need to use a EOL to find a termination of a control statement, but with the grammar above, all EOL is hidden and not used in the control statement rules.

有没有办法改变我的语法，以便我可以跳过所有 EOL 但需要终止部分控制语句的那些?

Is there a way to change my grammar so that I can skip all EOL but the ones needed to terminate parts of my control statements?

推荐答案

找到了一种处理方法.

这个想法是将 EOL 转移到一个隐藏频道和我不想在另一个隐藏频道中看到的其他内容(如空格和评论).然后，当 EOL 应该出现并检查之前的令牌通道时，我使用一些代码来回溯令牌(因为它们已经被消耗了).如果我在EOL频道找到了一些东西，然后再遇到普通频道的东西，那没关系.

The idea is to divert EOL into one hidden channel and the other stuff I don´t want to see in another hidden channel (like spaces and comments). Then I use some code to backtrack the tokens when an EOL is supposed to show up and examine the previous tokens channels (since they already have been consumed). If I find something on EOL channel before I run into something from the ordinary channel, then it is ok.

看起来像这样:

更改词法分析器规则:

@lexer::members {
    public static int EOL_CHANNEL = 1;
    public static int OTHER_CHANNEL = 2;
}

...

EOL
  : '\r'? '\n'  ->  channel(EOL_CHANNEL)
  ;

LINE_SPACE
  : [ \t]+  ->  channel(OTHER_CHANNEL)
  ;

我还将所有其他 HIDDEN 频道(评论)转移到 OTHER_CHANNEL.然后我更改了规则 end_control_statment:

I also diverted all other HIDDEN channels (comments) to the OTHER_CHANNEL.Then I changed the rule end_control_statment:

end_control_statment
  : END_STATEMENT
  | { isEOLPrevious() }?
  ;

并添加

@parser::members {
  public static int EOL_CHANNEL = 1;
  public static int OTHER_CHANNEL = 2;

  boolean isEOLPrevious()
  {
        int idx = getCurrentToken().getTokenIndex();
        int ch;

        do
        {
            ch = getTokenStream().get(--idx).getChannel();
        }
        while (ch == OTHER_CHANNEL);

        // Channel 1 is only carrying EOL, no need to check token itself
        return (ch == EOL_CHANNEL);
     }
}

可以坚持使用普通的隐藏通道，但是在回溯时需要同时跟踪通道和令牌，所以这可能会更容易一些......

One could stick to the ordinary hidden channel but then there is a need to both track channel and tokens while backtracking so this is maybe a bit easier...

希望这可以帮助其他人处理此类问题...

Hope this could help someone else dealing with these kind of issues...

这篇关于Antlr4:如何在语法中隐藏和使用令牌的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..