问题描述
这是我语法的一部分:
expr_address
: expr_address_category expr_opt { $$ = new ExprAddress($1,*$2);}
| axis axis_data { $$ = new ExprAddress($1,*$2);}
;
axis_data
: expr_opt { $$ = $1;}
| sign { if($1 == MINUS)
$$ = new IntergerExpr(-1000000000);
else if($1 == PLUS)
$$ = new IntergerExpr(+1000000000);}
;
expr_opt
: { $$ = new IntergerExpr(0);}
| expr { $$ = $1;}
;
expr_address_category
: I { $$ = NCAddress_I;}
| J { $$ = NCAddress_J;}
| K { $$ = NCAddress_K;}
;
axis
: X { $$ = NCAddress_X;}
| Y { $$ = NCAddress_Y;}
| Z { $$ = NCAddress_Z;}
| U { $$ = NCAddress_U;}
| V { $$ = NCAddress_V;}
| W { $$ = NCAddress_W;}
;
expr
: '[' expr ']' {$$ = $2;}
| COS parenthesized_expr {$$ = new BuiltinMethodCallExpr(COS,*$2);}
| SIN parenthesized_expr {$$ = new BuiltinMethodCallExpr(SIN,*$2);}
| ATAN parenthesized_expr {$$ = new BuiltinMethodCallExpr(ATAN,*$2);}
| SQRT parenthesized_expr {$$ = new BuiltinMethodCallExpr(SQRT,*$2);}
| ROUND parenthesized_expr {$$ = new BuiltinMethodCallExpr(ROUND,*$2);}
| variable {$$ = $1;}
| literal
| expr '+' expr {$$ = new BinaryOperatorExpr(*$1,PLUS,*$3);}
| expr '-' expr {$$ = new BinaryOperatorExpr(*$1,MINUS,*$3);}
| expr '*' expr {$$ = new BinaryOperatorExpr(*$1,MUL,*$3);}
| expr '/' expr {$$ = new BinaryOperatorExpr(*$1,DIV,*$3);}
| sign expr %prec UMINUS {$$ = new UnaryOperatorExpr($1,*$2);}
| expr EQ expr {$$ = new BinaryOperatorExpr(*$1,EQ,*$3);}
| expr NE expr {$$ = new BinaryOperatorExpr(*$1,NE,*$3);}
| expr GT expr {$$ = new BinaryOperatorExpr(*$1,GT,*$3);}
| expr GE expr {$$ = new BinaryOperatorExpr(*$1,GE,*$3);}
| expr LT expr {$$ = new BinaryOperatorExpr(*$1,LT,*$3);}
| expr LE expr {$$ = new BinaryOperatorExpr(*$1,LE,*$3);}
;
variable
: d_h_address {$$ = new AddressExpr(*$1);}
;
d_h_address
: D INTEGER_LITERAL { $$ = new IntAddress(NCAddress_D,$2);}
| H INTEGER_LITERAL { $$ = new IntAddress(NCAddress_H,$2);}
;
我希望我的语法支持以下内容:
I hope my grammar support that like:
H100=20;
X;
X+0;
X+;
X+H100; //means H100 variable ref
前两个与X0相同;顺便签名-> +/-;
The top two are same with X0; By the way,sign -> +/-;
但是bison报告冲突,bison.output的关键部分:
But bison report conflicts,the key part of bison.output:
State 108
11 expr: sign . expr
64 axis_data: sign .
INTEGER_LITERAL shift, and go to state 93
REAL_LITERAL shift, and go to state 94
'+' shift, and go to state 74
'-' shift, and go to state 75
COS shift, and go to state 95
SIN shift, and go to state 96
ATAN shift, and go to state 97
SQRT shift, and go to state 98
ROUND shift, and go to state 99
D shift, and go to state 35
H shift, and go to state 36
'[' shift, and go to state 100
D [reduce using rule 64 (axis_data)]
H [reduce using rule 64 (axis_data)]
$default reduce using rule 64 (axis_data)
State 69
62 expr_address: axis . axis_data
INTEGER_LITERAL shift, and go to state 93
REAL_LITERAL shift, and go to state 94
'+' shift, and go to state 74
'-' shift, and go to state 75
COS shift, and go to state 95
SIN shift, and go to state 96
ATAN shift, and go to state 97
SQRT shift, and go to state 98
ROUND shift, and go to state 99
D shift, and go to state 35
H shift, and go to state 36
'[' shift, and go to state 100
D [reduce using rule 65 (expr_opt)]
H [reduce using rule 65 (expr_opt)]
$default reduce using rule 65 (expr_opt)
State 68
61 expr_address: expr_address_category . expr_opt
INTEGER_LITERAL shift, and go to state 93
REAL_LITERAL shift, and go to state 94
'+' shift, and go to state 74
'-' shift, and go to state 75
COS shift, and go to state 95
SIN shift, and go to state 96
ATAN shift, and go to state 97
SQRT shift, and go to state 98
ROUND shift, and go to state 99
D shift, and go to state 35
H shift, and go to state 36
'[' shift, and go to state 100
D [reduce using rule 65 (expr_opt)]
H [reduce using rule 65 (expr_opt)]
$default reduce using rule 65 (expr_opt)
我不知道该如何处理,谢谢.
I don't know how to deal with this,thanks advance.
我做一个最小的语法:
EDIT:I make a minimal grammar:
%{
#include <stdio.h>
extern "C" int yylex();
void yyerror(const char *s) { printf("ERROR: %s/n", s); }
%}
%token PLUS '+' MINUS '-'
%token D H I J K X Y Z INT
/*%type sign expr var expr_address_category expr_opt
%type axis */
%start word_list
%%
/*Above grammar lost this rule,it makes ambiguous*/
word_list
: word
| word_list word
;
sign
: PLUS
| MINUS
;
expr
: var
| sign expr
| '[' expr ']'
;
var
: D INT
| H INT
;
word
: expr_address
| var '=' expr
;
expr_address
: expr_address_category expr_opt
/*| '(' axis sign ')'*/
| axis sign
;
expr_opt
: /* empty */
| expr
;
expr_address_category
: I
| J
| K
| axis
;
axis
: X
| Y
| Z
;
%%
我希望它能支持:
X;
X0;
X+0; //the top three are same with X0
X+;
X+H100; //this means X's data is ref +H100;
X+H100=10; //two word on a block,X+ and H100=10;
XH100=10; //two word on a block,X and H100=10;
上面的EDIT丢失了此规则.
The above EDIT lost this rule.
block
: word_list ';'
| ';'
;
因为我必须允许这样的语法:
Because I have to allow such grammar:
H000 = 100 H001 = 200 H002 = 300;
推荐答案
这本质上是经典的LR(2)语法,但在您的情况下是LR(3),因为您的变量包含两个标记[注1] :
This is essentially the classic LR(2) grammar, except that in your case it is LR(3) because your variables consist of two tokens [Note 1]:
var : D INT | H INT
基本问题是没有分隔符的单词的串联:
The basic problem is the concatenation of words without separators:
word_list : word | word_list word
结合了word
的一个选项以可选的var
结尾的事实:
combined with the fact that one of the options for word
ends with an optional var
:
word: expr_address
expr_address: expr_address_category expr_opt
,而另一个以var
开头:
word: var '=' expr
使这一点变得明确,因为expr
中的任何内容都不能包含该符号.但是在需要做出决定的时候,不可见,因为前瞻是var
的第一个标记-H
或D
- -并且等号仍在两个标记之外.
The makes this unambiguous, since nothing in an expr
can contain that symbol. But at the point where a decision needs to be made, the is not visible, because the lookahead is the first token of a var
-- either an H
or a D
-- and the equals sign is still two tokens away.
此LR(2)语法与yacc/bison本身使用的语法非常相似,这一事实我总是觉得很具有讽刺意味,因为yacc的语法不需要在生产之间使用 :
This LR(2) grammar is very similar to the grammar used by yacc/bison itself, a fact which I always find to be ironic, because the grammar for yacc does not require between productions:
production: SYMBOL ':' | production SYMBOL /* Lots of detail omitted */
与您的语法一样,由于无法清晰显示,因此无法知道SYMBOL
是应移位还是触发归约.
As with your grammar, this makes it impossible to know whether a SYMBOL
should be shifted or trigger a reduce because the disambiguating is still not visible.
由于(我认为)语法是明确的,并且bison现在可以生成GLR解析器,所以这将是最简单的解决方案:只需添加
Since the grammar is (I assume) unambiguous, and bison can now generate GLR parsers, that will be the simplest solution: just add
%glr-parser
到您的野牛序幕(但请阅读《野牛手册》中有关GLR解析器的部分,以了解权衡取舍).
to your bison prologue (but read the section of the bison manual on GLR parsers to understand the trade-off).
请注意,shift-reduce冲突仍将被报告为警告;由于不可能可靠地确定语法是否模棱两可,因此bison不会尝试这样做,如果存在歧义,则会在运行时报告歧义.
Note that the shift-reduce conflicts will still be reported as warnings; since it is impossible to reliably decide whether a grammar is ambiguous, bison doesn't attempt to do so and ambiguities will be reported at run-time if they exist.
您还应该解决 @ChrisDodd的答案中提到的与expr_address
重构有关的问题(尽管使用了GLR解析器,这不是绝对必要的.)
You should also fix the issue mentioned in @ChrisDodd's answer regarding the refactoring of expr_address
(although with a GLR parser it is not strictly necessary).
如果出于某种原因,您认为GLR解析器无法满足您的需求,则可以在yacc的大多数实现中使用该解决方案(包括bison),这是词法扫描程序中的一个hack.基本思想是在词法分析器中标记符号后是否带有冒号,以便可以将上面的结果重写为:
If, for whatever reason, you feel that a GLR parser will not meet your needs, you could use the solution in most implementations of yacc (including bison), which is a hack in the lexical scanner. The basic idea is to mark whether a symbol is followed by a colon or not in the lexer, so that the above production could be rewritten as:
production: SYMBOL_COLON | production SYMBOL
如果您愿意将字母和数字组合成一个令牌,则此解决方案将为您服务:
This solution would work for you if you were willing to combine the letter and the number into a single token:
word: expr_address expr_opt
| VARIABLE_EQUALS expr
// ...
expr: VARIABLE
我的喜好是在词法分析器的包装器中进行此转换,该转换器保留待处理令牌的(一个元素)队列:
My preference is to do this transformation in a wrapper around the lexer, which keeps a (one-element) queue of pending tokens:
/* The use of static variables makes this yylex wrapper unreliable
* if it is reused after a syntax error.
*/
int yylex_wrapper() {
static int saved_token = -1;
static YYSTYPE saved_yylval = {0};
int token = saved_token;
saved_token = -1;
yylval = saved_yylval;
// Read a new token only if we don't have one in the queue.
if (token < 0) token = yylex();
// If the current token is IDENTIFIER, check the next token
if (token == IDENTIFIER) {
// Read the next token into the queue (saved_token / saved_yylval)
YYSTYPE temp_val = yylval;
saved_token = yylex();
saved_yylval = yylval;
yylval = temp_val;
// If the second token is '=', then modify the current token
// and delete the '=' from the queue
if (saved_token == '=') {
saved_token = -1;
token = IDENTIFIER_EQUALS;
}
}
return token;
}
注释
-
我个人首先将一个
var
做成一个令牌(您是否真的想让人们写:
Personally, I would start by making a
var
a single token (do you really want to allow people to write:
H /* Some comment in the middle of the variable name */ 100
但这不会解决任何问题;它只是将语法的超前要求从LR(3)减少到LR(2).
but that's not going to solve any problems; it merely reduces the grammar's lookahead requirement from LR(3) to LR(2).
这篇关于弹性与野牛转移/减少冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!