问题描述
我有一个表示表达式的语法.为了简单起见,它是:
S -> E
E -> T + E | T
T -> P * T | P
P -> a | (E)
a
,+
,*
,(
和)
是我字母表中的字母.
上述规则可以使用正确的运算顺序和关联性来生成包含括号,乘法和加法的有效算术表达式.
我的目标是接受每个字符串,其中包含0个或多个我的字母.这是我的限制条件:
- 语法必须接受"所有包含0个或多个我的字母的字符串.
- 可以通过算法将新的终端引入并插入到输入中. (由于某种原因,我发现明确提供
EOF
终端有助于检测超出有效表达式的其他令牌). - 可能会引入新的生产规则.新规则将是错误标志(即,如果使用1解析字符串,则尽管该字符串被接受,但从语义上来说仍被视为错误).
- 可以修改生产规则,以便插入新引入的终端.
- 语法至少应为Lex/Yacc类工具可处理的LALR(1)(如果我记得悬而未决的其他问题,尽管与Lex/Yacc兼容,但仍要求非LALR(1)).
此外,我希望额外的规则对应于不同类型的错误(缺少二进制运算的参数,缺少括号-左或右-超出本来可以接受的表达式的额外标记等).我之所以这么说,是因为可能存在一些琐碎的方法来回答我的问题,以接受"所有输入,否则将对错误报告无益.
我发现这些规则很有用,尽管我不知道它们是否违反了我的约束条件:
P -> @ [error]
P -> (E [error]
S -> E $ [instead of S -> E]
S -> E X $ [error]
X -> X Y [error]
X -> Y [error]
Y -> a [error]
Y -> ( [error]
Y -> ) [error]
Y -> * [error]
Y -> + [error]
其中,$
是显式的EOF
令牌,而@
是空字符串.
如果我的问题不清楚,我该如何在约束范围内修改语法,以实现接受所有输入的目标,最好是规则与错误类型的对应关系更好?我的规则符合我的目标吗?
这个问题已经存在了一段时间,涵盖了该主题的初学者经常访问的一个主题.人们常常发现,那些已经完成了本科学历的编译器课程的人知道,这是那些没有简单答案或单一答案的问题之一.您可能已经注意到,您有关于同一主题的两个问题,但都没有已经回答了. 别人发布的另一个问题得到了回答指向解释为什么如此困难的文献的指针.
这个问题已经存在了50多年了.如果人们随着时间的流逝,从早期的会议论文,课程教科书,博士学位论文和(今天)SO
中审查文献,我们可以看到经常引用这是一个错误问题的事实! (或者说是解决问题的方法错误).
只需从多年来的课程文本中取样(从我的书架中随机选择):
Gries,D.(1970)错误恢复与纠正-文献介绍,由Bauer,F.L.编辑的《 编译器构造,高级课程》 . & Eickel,J.,Springer Verlag,第627-638页.
Gries,D.(1971)数字计算机的编译器构造,Wiley,第320-326页.
Aho,A.V.,Ullman,J.D.(1977)编译器设计原理,Addison Wesley,第397-405页.
Bornat,R.(1979)理解和编写编译器,Macmillan,第251-252页.
Hanson,D.(1995)可重定位的 C 编译器:设计与实现,Addison-Wesley,第140-146页.Grune,D.,Bal,H.E.,Jacobs,C.J.H. &兰根多恩(K.G.) (2000) Modern Compiler Design ,Wiley,第175-184页.
Aho,A.V.,Lam,M.S.,Sethi,R.,Ullman,J.D.(2007) Compilers:Principles,Techniques,&工具,皮尔逊(Pearson),艾迪生·韦斯利(Addison-Wesley),第283-296页.
所有这些(超过40年)都同意您的问题是关于错误地使用错误的工具或方向错误的.我想我想说的是你不能从这里到那里".您应该从其他地方开始.
如果您想要更深入的研究,请阅读完整的博士学位论文:
查尔斯(1991)一种构造有效LALR的实用方法)纽约大学具有自动错误恢复功能的解析器
希望对以后再次访问此问题的人来说,有一个占位符来表示答案.
我从对另一个问题的评论中注意到,您正在使用从CPPG派生的MPPG.并非所有人都会使用这些工具,因此我提供了这些工具的几个链接:
托管的Babel系统必备内容
花园点解析器生成器
Irony .NET编译器构建套件
编写第一个Visual Studio语言服务
I have a grammar that represents expressions. Let's say for simplicity it's:
S -> E
E -> T + E | T
T -> P * T | P
P -> a | (E)
With a
, +
, *
, (
and )
being the letters in my alphabet.
The above rules can generate valid arithmetic expressions containing parenthesis, multiplication and addition using proper order of operations and associativity.
My goal is to accept every string, containing 0 or more of the letters of my alphabet. Here are my constraints:
- The grammar must "accept" all strings contained 0 or more letters of my alphabet.
- New terminals may be introduced and inserted into the input algorithmically. (I found that explicitly providing an
EOF
terminal helped when detecting extra tokens beyond an otherwise valid expression, for some reason.) - New production rules may be introduced. The new rules will be error flags (i.e. if the string is parsed using one, then although the string is accepted, it is considered to semantically be an error).
- The production rules may be modified so that newly-introduced terminals are inserted.
- The grammar should be LALR(1) at least handle-able by Lex/Yacc-like tools (If I recall the dangling else problem requires non-LALR(1), in spite of being compatible with Lex/Yacc).
Additionally I would like the extra rules to correspond to different kinds of errors (missing arguments to a binary operation, missing parenthesis - left or right - extra token(s) beyond an otherwise accept-able expression, etc.). I say that because there may be some trivial way to answer my question to "accept" all inputs that otherwise wouldn't be beneficial for error reporting.
I've found these rules to be useful, although I do not know if they violate my constraints, or not:
P -> @ [error]
P -> (E [error]
S -> E $ [instead of S -> E]
S -> E X $ [error]
X -> X Y [error]
X -> Y [error]
Y -> a [error]
Y -> ( [error]
Y -> ) [error]
Y -> * [error]
Y -> + [error]
where $
is the explicit EOF
token and @
is the empty string.
In case my question wasn't clear: How can I modify my grammar within my constraints to achieve my goal of accepting all inputs, preferably with a nice correspondence of rules to types of errors? Do my rules meet my goal?
This question has been around for a while and covers a topic that is is often visited by beginners in the subject. One often find that those who have done a compilers course in their undergraduate degree know that this is one of those questions that has no easy or single answer. You might have noticed that you have two questions on the same topic, neither of which has been answered. Another question someone else posted was answered with pointers to the literature that explains why this is hard.
It is a question that has remained extant for over 50 years. If one examines the literature over time, from early conference papers, course textbooks, doctoral thesis and (today) SO
, we can see regular references to the fact that this is the wrong question! (Or rather the wrong approach to the problem).
Just taking a sample from course texts over the years (random selections from my bookshelf):
Gries, D. (1970) Error Recovery and Correction - An introduction to the Literature, In Compiler Construction, An advanced Course edited by Bauer, F.L. & Eickel, J., Springer Verlag, pp.627-638.
Gries, D. (1971) Compiler Construction for Digital Computers, Wiley, pp.320-326.
Aho, A.V., Ullman, J.D. (1977) Principles of Compiler Design, Addison Wesley, pp.397-405.
Bornat, R. (1979) Understanding and Writing Compilers, Macmillan, pp.251-252.
Hanson, D. (1995) A retargetable C compiler: Design and Implementation,Addison-Wesley, pp.140-146.
Grune, D., Bal, H.E., Jacobs, C.J.H. & Langendoen, K.G. (2000) Modern Compiler Design, Wiley, pp.175-184.
Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D. (2007) Compilers: Principles, Techniques, & Tools, Pearson, Addison-Wesley, pp.283-296.
All of these (over a 40 year span) concur that your question is about using the wrong tools wrongly or going in the wrong direction. I think I'm trying to say "You can't there from here". You should start from somewhere else.
If you want something deeper, there is a whole Phd thesis:
Hopefully, for those who visit this question again in the future, there is a placeholder for the answers.
I note from comments on your other question that you are using MPPG, derived from CPPG. Not everyone will have used these, so I'm including a couple of links to these tools:
Managed Babel Systems Essentials
Garden Points Parser Generator
Irony .NET compiler Construction Kit
Writing your first Visual Studio Language Service
这篇关于通过生产规则向LALR(1)语法添加错误检查以处理所有输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!