algorithm - 学习野牛: What are context-free grammars and LALR(1)?

我正在阅读bison简介。

我有两个问题，如果有人可以帮助我理解，那就太好了:

术语context free grammar是什么意思？

从上面的链接:并非所有上下文无关的语言都可以由Bison处理，只有LALR(1)可以处理。简而言之，这意味着必须有可能仅使用一个预读标记就可以知道如何解析输入字符串的任何部分。

是什么意思？

最佳答案

上下文无关文法是对一组字符串的描述，该字符串严格比正则表达式更强大，但仍很容易由机器处理。更正式地说，无上下文语法由四部分组成:

一组终端符号，它们是要生成的字符串的元素。对于野牛解析器，这通常是扫描仪生成的 token 集。对于自然语言(如英语)的语法，这可能是所有英语单词的集合。

一组非终结符。从直觉上讲，非终结符表示类似词性的内容，例如“名词”或“动词”。

一组产品。 每个产品都说明如何用其他一组终端和非终端替换非终端符号。例如，生产Variable -> Type Name表示如果看到非终结符Variable，则可以将其替换为字符串Type Name。

一个起始符号，它是派生自其开始的非终结符。

例如，请考虑针对C样式函数声明的以下简单的无上下文语法:

Function -> Type ident(Arguments)
Type -> int
Type -> Type *
Arguments -> e   (the empty string)
Arguments -> ArgList
ArgList -> Type ident
ArgList -> Type ident, ArgList

在这里，开始符号是Function。给定这种语法，我们可以通过反复选择一个非终结符并将其替换为匹配产生的右侧之一来产生C样式的函数声明。到目前为止，在每个步骤中，我们构建的字符串都称为句子形式。例如，以下是可以从上述语法派生的一些不同的函数声明:

Sentential Form                       Production
-------------------------------------------------------------------
Function                              Function -> Type ident(Arguments)
Type ident(Arguments)                 Type -> int
int ident(Arguments)                  Arguments -> e
int ident()

Sentential Form                       Production
-------------------------------------------------------------------
Function                              Function -> Type ident(Arguments)
Type ident(Arguments)                 Type -> Type*
Type* ident(Arguments)                Type -> int
int* ident(Arguments)                 Arguments -> ArgList
int* ident(ArgList)                   ArgList -> Type ident, ArgList
int* ident(Type ident, ArgList)       ArgList -> Type ident
int* ident(Type ident, Type ident)    Type -> Type*
int* ident(Type* ident, Type ident)   Type -> Type*
int* ident(Type** ident, Type ident)  Type -> int
int* ident(int** ident, Type ident)   Type -> int
int* ident(int** ident, int ident)

大多数编程语言都具有可以由上下文无关的语法描述的结构。解析器的工作是获取程序和语法，并确定语法如何生成该程序。

至于LALR(1)，不幸的是，LALR(1)的正式定义并不简单。我刚刚完成了编译器构建类(class)的教学，在首先花了两堂讨论相关解析技术的讲座之后，我们才只能谈论LALR(1)。如果您想对 Material 进行正式介绍，可以从 at the course website 获得我的自下而上解析的幻灯片。

LALR(1)是一种称为自底向上解析算法的解析算法，这意味着它试图以相反的顺序应用语法的产生，以将程序还原为起始符号。例如，让我们考虑这个字符串，它是由上面的语法生成的:

int** ident(int* ident)

在自下而上的解析中，我们将通过一次在程序中查看一个 token 来解析此字符串。每当我们发现某些可以逆转为非终结点的东西时，我们都会这样做。 (更确切地说，LALR(1)仅在满足其他条件时也进行这些归约，以便算法具有更多上下文，但是对于本示例，我们不必为此担心)。解析中的每个步骤都可以说是转换或 reduce 。移位意味着我们再看一次输入的 token ，以获得有关要应用哪些减少量的更多信息。减少意味着我们采用一定数量的 token 和非终结符，并反转生产以返回到一些非终结符。

这是字符串自底向上解析的痕迹:

Workspace           Input                     Action
-----------------------------------------------------------------
             int** ident(int* ident)   Shift
int             ** ident(int* ident)   Reduce Type -> int
Type            ** ident(int* ident)   Shift
Type*            * ident(int* ident)   Reduce Type -> Type*
Type             * ident(int* ident)   Shift
Type*              ident(int* ident)   Reduce Type -> Type*
Type               ident(int* ident)   Shift
Type ident              (int* ident)   Shift
Type ident(              int* ident)   Shift
Type ident(int              * ident)   Reduce Type -> int
Type ident(Type             * ident)   Shift
Type ident(Type*              ident)   Reduce Type -> Type*
Type ident(Type               ident)   Shift
Type ident(Type ident              )   Reduce ArgList -> Type ident
Type ident(ArgList                 )   Reduce Arguments -> ArgList
Type ident(Arguments               )   Shift
Type ident(Arguments)                  Reduce Function -> Type ident(Arguments)
Function                               ACCEPT

了解移位和归约非常重要，因为在使用野牛时，您将始终遇到移位/减少冲突和减少/减少冲突。这些错误意味着解析器生成器确定解析器可能进入一种状态，在该状态下它无法判断是移位还是缩小，或者它应该执行两次缩小中的哪一个。有关如何处理此问题的更多详细信息，请查阅野牛手册。

如果您想进一步了解上下文无关的语法和解析算法，那么有一本非常出色的书，由Grune和Jacobs撰写，标题为 Parsing Techniques: A Practical Guide, Second Edition ，到目前为止，该书对我见过的 Material 进行了最好的处理。它涵盖了各种各样的解析算法，包括许多比LALR(1)更强大的技术，这些技术开始得到更广泛的使用(例如，GLR和Earley)。我强烈推荐这本书-这是我对语法的理解如此之好的主要原因!

希望这可以帮助!

关于algorithm - 学习野牛: What are context-free grammars and LALR(1)?，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/7179256/