问题描述
我正在解析一种脚本语言,它定义了两种类型的语句;控制语句和非控制语句.非控制语句总是以 ';'
结束,而控制语句可能以 ';'
或 EOL
('\n') 结束.部分语法如下所示:
I'm parsing a script language that defines two types of statements; control statements and non control statements. Non control statements are always ended with ';'
, while control statements may end with ';'
or EOL
('\n'). A part of the grammar looks like this:
script
: statement* EOF
;
statement
: control_statement
| no_control_statement
;
control_statement
: if_then_control_statement
;
if_then_control_statement
: IF expression THEN end_control_statment
( statement ) *
( ELSEIF expression THEN end_control_statment ( statement )* )*
( ELSE end_control_statment ( statement )* )?
END IF end_control_statment
;
no_control_statement
: sleep_statement
;
sleep_statement
: SLEEP expression END_STATEMENT
;
end_control_statment
: END_STATEMENT
| EOL
;
END_STATEMENT
: ';'
;
ANY_SPACE
: ( LINE_SPACE | EOL ) -> channel(HIDDEN)
;
EOL
: [\n\r]+
;
LINE_SPACE
: [ \t]+
;
在脚本语言的所有其他方面,我从不关心 EOL
所以我使用普通的词法分析器规则来隐藏空格.
In all other aspects of the script language, I never care about EOL
so I use the normal lexer rules to hide white space.
这在所有情况下都可以正常工作,但在我需要使用 EOL
来查找控制语句的终止的情况下,但使用上面的语法,所有 EOL
隐藏,不在控制语句规则中使用.
This works fine in all cases but the cases where I need to use a EOL
to find a termination of a control statement, but with the grammar above, all EOL
is hidden and not used in the control statement rules.
有没有办法改变我的语法,以便我可以跳过所有 EOL
但需要终止部分控制语句的那些?
Is there a way to change my grammar so that I can skip all EOL
but the ones needed to terminate parts of my control statements?
推荐答案
找到了一种处理方法.
这个想法是将 EOL 转移到一个隐藏频道和我不想在另一个隐藏频道中看到的其他内容(如空格和评论).然后,当 EOL 应该出现并检查之前的令牌通道时,我使用一些代码来回溯令牌(因为它们已经被消耗了).如果我在EOL频道找到了一些东西,然后再遇到普通频道的东西,那没关系.
The idea is to divert EOL into one hidden channel and the other stuff I don´t want to see in another hidden channel (like spaces and comments). Then I use some code to backtrack the tokens when an EOL is supposed to show up and examine the previous tokens channels (since they already have been consumed). If I find something on EOL channel before I run into something from the ordinary channel, then it is ok.
看起来像这样:
更改词法分析器规则:
@lexer::members {
public static int EOL_CHANNEL = 1;
public static int OTHER_CHANNEL = 2;
}
...
EOL
: '\r'? '\n' -> channel(EOL_CHANNEL)
;
LINE_SPACE
: [ \t]+ -> channel(OTHER_CHANNEL)
;
我还将所有其他 HIDDEN 频道(评论)转移到 OTHER_CHANNEL
.然后我更改了规则 end_control_statment
:
I also diverted all other HIDDEN channels (comments) to the OTHER_CHANNEL
.Then I changed the rule end_control_statment
:
end_control_statment
: END_STATEMENT
| { isEOLPrevious() }?
;
并添加
@parser::members {
public static int EOL_CHANNEL = 1;
public static int OTHER_CHANNEL = 2;
boolean isEOLPrevious()
{
int idx = getCurrentToken().getTokenIndex();
int ch;
do
{
ch = getTokenStream().get(--idx).getChannel();
}
while (ch == OTHER_CHANNEL);
// Channel 1 is only carrying EOL, no need to check token itself
return (ch == EOL_CHANNEL);
}
}
可以坚持使用普通的隐藏通道,但是在回溯时需要同时跟踪通道和令牌,所以这可能会更容易一些......
One could stick to the ordinary hidden channel but then there is a need to both track channel and tokens while backtracking so this is maybe a bit easier...
希望这可以帮助其他人处理此类问题...
Hope this could help someone else dealing with these kind of issues...
这篇关于Antlr4:如何在语法中隐藏和使用令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!