问题描述
我正在尝试为我正在编写的更简单的语言编写一个简单的解析器.它由后缀表达式组成.到目前为止,我遇到了解析器的问题.当我在输入 2 2 * test >>
上运行它时,我得到一个 MismatchedTokenException.
另外,我将如何实现递归后缀解析器?
这是我的代码:
语法之星;选项 {语言=Python;输出=AST;ASTLabelType=普通树;}代币{DECL;}//开始//: 声明 ;//声明//: 类型 ID ->^(DECL 类型 ID)//;程序:(身体)+;正文:(嵌套 WS)*|(var WS)*|(获得 WS)*;无功: 嵌套 ID '>>';得到: ID '<
解决方案
有几点不正确:
1
您已将
WS
标记放在 HIDDEN
通道上,这使得它们对解析器规则不可用.因此,body
规则中的所有 WS
标记都是不正确的.
2
_(抱歉,您的其他问题 有一个左递归规则(
expr
),所以我会在这里留下这个信息)_
ANTLR 是一个 LL 解析器-generator,因此您可以创建左递归语法.以下是左递归:
expr: 术语术语运算符;学期: INT|ID|表达式;
因为
expr
规则中的第一个 term
可能匹配 expr
规则本身.与任何 LL 解析器一样,ANTLR 生成的解析器无法处理左递归.
3
如果您修复了
WS
问题,您的 body
规则将产生以下错误消息:
(1/7) Decision 可以匹配输入,例如INT",使用多个选择
这意味着解析器无法看到"
INT
令牌属于哪个规则.这是因为您的所有 body
替代项都可以重复零次或多次,并且 expr
和 nested
也会重复.并且所有这些都可以匹配 INT
,这正是 ANTLR 所抱怨的.如果您像这样删除 *
:
正文: 嵌套|无功|得到;//...表达式:term(术语运算符);嵌套: expr (expr 运算符);
错误会消失(尽管这仍然不会导致您的输入被正确解析!).
我意识到这听起来可能仍然含糊不清,但解释(或理解,如果您不熟悉这一切)并非易事.
4
要正确考虑
expr
中的递归expr
,您需要像我在 sorry, your other question has a left recursive rule (expr
), so I'll leave this info in here)_
ANTLR is an LL parser-generator, so you can'r create left-recursive grammars. The following is left recursive:
expr
: term term operator
;
term
: INT
| ID
| expr
;
because the first
term
inside the expr
rule could possible match an expr
rule itself. Like any LL parser, ANTLR generated parser cannot cope with left recursion.
3
If you fix the
WS
issue, your body
rule will produce the following error message:
(1/7) Decision can match input such as "INT" using multiple alternatives
This means that the parser cannot "see" to which rule the
INT
token belongs. This is due to the fact that all your body
alternative can be repeated zero or more times and expr
and nested
are also repeated. And all of them can match an INT
, which is what ANTLR is complaining about. If you remove the *
's like this:
body
: nested
| var
| get
;
// ...
expr
: term (term operator)
;
nested
: expr (expr operator)
;
the errors would disappear (although that would still not cause your input to be parsed properly!).
I realize that this might still sound vague, but it's not trivial to explain (or comprehend if you're new to all this).
4
To properly account for recursive
expr
inside expr
, you'll need to stay clear of left recursion as I explained in #2. You can do that like this:
expr
: term (expr operator | term operator)*
;
which is still ambiguous, but that is in case of describing a postfix expression using an LL grammar, unavoidable AFAIK. To resolve this, you could enable global backtracking inside the
options { ... }
section of the grammar:
options {
language=Python;
output=AST;
backtrack=true;
}
Demo
A little demo of how to parse recursive expressions could look like:
grammar star;
options {
language=Python;
output=AST;
backtrack=true;
}
parse
: expr EOF -> expr
;
expr
: (term -> term) ( expr2 operator -> ^(operator $expr expr2)
| term operator -> ^(operator term term)
)*
;
expr2
: expr
;
term
: INT
| ID
;
operator
: ('*' | '+' | '/' | '%' | '-')
;
ID
: ('a'..'z' | 'A'..'Z') ('a..z' | '0'..'9' | 'A'..'Z')*
;
INT
: '0'..'9'+
;
WS
: (' ' | '\n' | '\t' | '\r') {$channel=HIDDEN;}
;
The test script:
#!/usr/bin/env python
import antlr3
from antlr3 import *
from antlr3.tree import *
from starLexer import *
from starParser import *
def print_level_order(tree, indent):
print '{0}{1}'.format(' '*indent, tree.text)
for child in tree.getChildren():
print_level_order(child, indent+1)
input = "5 1 2 + 4 * + 3 -"
char_stream = antlr3.ANTLRStringStream(input)
lexer = starLexer(char_stream)
tokens = antlr3.CommonTokenStream(lexer)
parser = starParser(tokens)
tree = parser.parse().tree
print_level_order(tree, 0)
produces the following output:
-
+
5
*
+
1
2
4
3
which corresponds to the following AST:
这篇关于ANTLR 解析 MismatchedTokenException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!