使用 ANTLR3 解析换行符、EOF 作为语句结束标记

本文介绍了使用 ANTLR3 解析换行符、EOF 作为语句结束标记的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的问题是关于在 ANTLRWorks 中运行以下语法:

My question is in regards to running the following grammar in ANTLRWorks:

INT :('0'..'9')+;
SEMICOLON: ';';
NEWLINE: ('\r\n'|'\n'|'\r');
STMTEND: (SEMICOLON (NEWLINE)*|NEWLINE+);

statement
    : STMTEND
    | INT STMTEND
    ;

program: statement+;

我使用以下输入得到以下结果(以 program 作为开始规则)，无论是哪个换行符 NL (CR/LF/CRLF) 或整数 I选择:

I get the following results with the following input (with program as the start rule), regardless of which newline NL (CR/LF/CRLF) or integer I choose:

"; NL" 或 "32; NL" 解析没有错误.；"或45；"(不带换行符)导致 EarlyExitException."NL" 本身解析没有错误.456 NL"，没有分号，会导致 MismatchedTokenException.

"; NL" or "32; NL" parses without error.";" or "45;" (without newlines) result in EarlyExitException."NL" by itself parses without error."456 NL", without the semicolon, results in MismatchedTokenException.

我想要的是用换行符、分号或分号后跟换行符来终止语句，并且我希望解析器在终止时吃尽可能多的连续换行符，所以 "; NLNL NL NL" 只是一个终止，而不是四个或五个.另外，我也希望文件结束案例是有效的终止，但我还不知道如何做到这一点.

What I want is for a statement to be terminated by a newline, semicolon, or semicolon followed by newline, and I want the parser to eat as many contiguous newlines as it can on a termination, so "; NL NL NL NL" is just one termination, not four or five. Also, I would like the end-of-file case to be a valid termination as well, but I don't know how to do that yet.

那么这有什么问题，我怎样才能让它在 EOF 很好地终止?我对所有的解析、ANTLR 和 EBNF 都是全新的，而且我还没有找到很多材料可以在简单计算器示例和参考之间的某个级别阅读(我有权威的 ANTLR 参考，但它确实是一个参考，前面有一个快速入门，我还没有在 ANTLRWorks 之外运行)，所以任何阅读建议(除了 Wirth 1977 年的 ACM 论文)也会有帮助.谢谢！

So what's wrong with this, and how can I make this terminate nicely at EOF? I'm completely new to all of parsing, ANTLR, and EBNF, and I haven't found much material to read on it at a level somewhere in between the simple calculator example and the reference (I have The Definitive ANTLR Reference, but it really is a reference, with a quick start in the front which I haven't yet got to run outside of ANTLRWorks), so any reading suggestions (besides Wirth's 1977 ACM paper) would be helpful too. Thanks!

推荐答案

如果输入像 ";" 或 "45;"，标记 STMTEND 永远不会被创建.

In case of input like ";" or "45;", the token STMTEND will never be created.

";" 将创建一个标记:SEMICOLON，"45;" 将产生:INT SEMICOLON.

";" will create a single token: SEMICOLON, and "45;" will produce: INT SEMICOLON.

您(可能)想要的是 SEMICOLON 和 NEWLINE 本身永远不会成为真正的令牌，但它们将始终是 STMTEND.您可以通过制定所谓的片段"规则来做到这一点:

What you (probably) want is that SEMICOLON and NEWLINE never make it to real tokens themselves, but they will always be a STMTEND. You can do that by making them so called "fragment" rules:

program: statement+;

statement
 : STMTEND
 | INT STMTEND
 ;

INT     : '0'..'9'+;
STMTEND : SEMICOLON NEWLINE* | NEWLINE+;

fragment SEMICOLON : ';';
fragment NEWLINE   : '\r' '\n' | '\n' | '\r';

片段规则仅适用于其他词法分析器规则，因此它们永远不会出现在解析器(生产)规则中.强调:上面的语法只会创建 INT 或 STMTEND 标记.

Fragment rules are only available for other lexer rules, so they will never end up in parser (production) rules. To emphasize: the grammar above will only ever create either INT or STMTEND tokens.

                        这篇关于使用 ANTLR3 解析换行符、EOF 作为语句结束标记的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！