本文介绍了使用ANTLR3将换行符,EOF解析为语句结束标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是关于在ANTLRWorks中运行以下语法:

My question is in regards to running the following grammar in ANTLRWorks:

INT :('0'..'9')+;
SEMICOLON: ';';
NEWLINE: ('\r\n'|'\n'|'\r');
STMTEND: (SEMICOLON (NEWLINE)*|NEWLINE+);

statement
    : STMTEND
    | INT STMTEND
    ;

program: statement+;

无论输入哪个换行符 NL (CR/LF/CRLF)或整数I,我都可以通过以下输入(以 program 作为开始规则)获得以下结果:选择:

I get the following results with the following input (with program as the start rule), regardless of which newline NL (CR/LF/CRLF) or integer I choose:

; NL "或"32; NL "解析没有错误.;"或"45;" (不包含换行符)将导致EarlyExitException." NL "本身解析没有错误.没有分号的"456 NL "会导致MismatchedTokenException.

"; NL" or "32; NL" parses without error.";" or "45;" (without newlines) result in EarlyExitException."NL" by itself parses without error."456 NL", without the semicolon, results in MismatchedTokenException.

我想要的是一个语句以换行符,分号或分号后接换行符来终止,并且我希望解析器在终止符处尽可能多地吃掉连续的换行符,所以; NL NL NL NL "只是一个终结点,而不是四个或五个.另外,我希望文件结尾的情况也可以是有效的终止,但是我还不知道该怎么做.

What I want is for a statement to be terminated by a newline, semicolon, or semicolon followed by newline, and I want the parser to eat as many contiguous newlines as it can on a termination, so "; NL NL NL NL" is just one termination, not four or five. Also, I would like the end-of-file case to be a valid termination as well, but I don't know how to do that yet.

那么这是怎么回事,如何使它在EOF终止呢?我对解析,ANTLR和EBNF完全陌生,并且在简单的计算器示例和参考之间的某个水平上,我没有发现太多要阅读的材料(我有The Definitive ANTLR Reference,但它确实是参考,在我之前还没有在ANTLRWorks之外快速入门过,因此,任何阅读建议(除了Wirth的1977 ACM论文)也将有所帮助.谢谢!

So what's wrong with this, and how can I make this terminate nicely at EOF? I'm completely new to all of parsing, ANTLR, and EBNF, and I haven't found much material to read on it at a level somewhere in between the simple calculator example and the reference (I have The Definitive ANTLR Reference, but it really is a reference, with a quick start in the front which I haven't yet got to run outside of ANTLRWorks), so any reading suggestions (besides Wirth's 1977 ACM paper) would be helpful too. Thanks!

推荐答案

在输入";""45;"的情况下,将永远不会创建令牌STMTEND.

In case of input like ";" or "45;", the token STMTEND will never be created.

";"将创建一个令牌:SEMICOLON,而"45;"将产生:INT SEMICOLON.

";" will create a single token: SEMICOLON, and "45;" will produce: INT SEMICOLON.

您(可能)想要的是SEMICOLONNEWLINE从未真正成为真正的代币,但它们将始终是STMTEND.您可以通过使它们成为所谓的碎片"规则来做到这一点:

What you (probably) want is that SEMICOLON and NEWLINE never make it to real tokens themselves, but they will always be a STMTEND. You can do that by making them so called "fragment" rules:

program: statement+;

statement
 : STMTEND
 | INT STMTEND
 ;

INT     : '0'..'9'+;
STMTEND : SEMICOLON NEWLINE* | NEWLINE+;

fragment SEMICOLON : ';';
fragment NEWLINE   : '\r' '\n' | '\n' | '\r';

片段规则仅可用于其他词法分析器规则,因此它们永远不会出现在解析器(生产)规则中.要强调的是:上面的语法只会创建INTSTMTEND标记.

Fragment rules are only available for other lexer rules, so they will never end up in parser (production) rules. To emphasize: the grammar above will only ever create either INT or STMTEND tokens.

这篇关于使用ANTLR3将换行符,EOF解析为语句结束标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-22 13:21
查看更多