问题描述
我正在使用Megaparsec开发一个小型解析器,并尝试解析算术.
I'm working on a small parser using Megaparsec and trying to parse arithmetic.
-- Arithmetic expressions
data Aexp = N Num
| V Var
| Mult Aexp Aexp
| Add Aexp Aexp
| Sub Aexp Aexp
deriving (Show, Eq, Read)
arithParser :: Parser Aexp
arithParser = V <$> strParser
<|> N <$> numParser
<|> Mult <$> arithParser <* tok "*" <*> arithParser
--boolParser :: Parser Bexp
strParser :: Parser Var
strParser = tok "\"" *> some (noneOf ("\n\r\"=[]{},:")) <* tok "\""
numParser :: Parser Num
numParser = (some (oneOf ['0' .. '9']) >>= return . read) <* whitespace
如果我运行命令Parse arithParser "5*5" "5*5"
,它将仅返回Right (N 5)
,而应返回Mult(N 5) (N 5)
.由于arithParser中的优先级.但是,如果我更改顺序,则似乎会陷入无限循环并崩溃.
If I run the command Parse arithParser "5*5" "5*5"
it just returns Right (N 5)
, where it should return Mult(N 5) (N 5)
. Because of the precedence in the arithParser. But if I change the order then it seems to go into an infinite loop and crash.
不确定我在这里做错了什么,我们将不胜感激.
Not sure what I'm doing wrong here, any help would be appreciated.
推荐答案
Parsec尝试使用<|>
的左替代项,然后再尝试使用右替代项.如果左选择成功,那么它将不会理会右选择.因此,在这种情况下,当输入5*5
作为输入时,Parsec的过程如下所示:
Parsec tries the left alternative of <|>
before it tries the right one. If the left alternative succeeds then it won't bother with the right one. So in this instance, when fed the input 5*5
, Parsec's process looks like this:
- 尝试
V <$> strParser
.strParser
以tok "\""
开头,但是输入字符串不是以'"'
开头,因此strParser
失败. - 尝试
N <$> numParser
.numParser
成功解析了数字5,因此Parsec仅返回N 5
. - 完成!无需尝试第三种选择.
- Try
V <$> strParser
.strParser
begins withtok "\""
, but the input string doesn't begin with'"'
sostrParser
fails. - Try
N <$> numParser
.numParser
successfully parses the number 5, so Parsec just returnsN 5
. - Done! No need to try the third alternative.
因此,我们可以尝试通过将Mult
选项上移到顶部并包裹在try
中来修补此解析器,以便它可以回溯并尝试numParser
或strParser
(如果输入结果不是)成为乘法.
So we can attempt to patch this parser up by moving the Mult
option up to the top, wrapped in a try
so that it can backtrack and try numParser
or strParser
if the input turns out not to be a multiplication.
arithParser :: Parser Aexp
arithParser = try (Mult <$> arithParser <* tok "*" <*> arithParser)
<|> N <$> numParser
<|> V <$> strParser
此解析器还有另一个更微妙的问题.让我们按照上述步骤进行操作.
This parser has another, more subtle problem. Let's walk through the steps, as above.
- 尝试
try (Mult <$> arithParser <* tok "*" <*> arithParser)
.此解析器以arithParser
开头,因此递归调用arithParser
. - 尝试
try (Mult <$> arithParser <* tok "*" <*> arithParser)
.此解析器以arithParser
开头,因此递归调用arithParser
. - 尝试
try (Mult <$> arithParser <* tok "*" <*> arithParser)
.此解析器以arithParser
开头,因此递归调用arithParser
. - ...
- Try
try (Mult <$> arithParser <* tok "*" <*> arithParser)
. This parser begins witharithParser
, so recursively callarithParser
. - Try
try (Mult <$> arithParser <* tok "*" <*> arithParser)
. This parser begins witharithParser
, so recursively callarithParser
. - Try
try (Mult <$> arithParser <* tok "*" <*> arithParser)
. This parser begins witharithParser
, so recursively callarithParser
. - ...
这是一个无限循环. Parsec无法处理左递归语法.您必须设计解析器,以使其在递归调用之前至少消耗一个令牌.这样做的一种常见方法是修整"语法:
It's an infinite loop. Parsec can't handle left-recursive grammars. You have to design your parser so that it consumes at least one token before a recursive call. One common way of doing this is to "flatten out" your grammar:
expr, term :: Parser AExp
expr = do
n <- term
rest <- optional $ tok "*" *> expr
return $ maybe n (Mult n) rest
term = N <$> numParser
<|> V <$> strParser
<|> parenthesised expr
parenthesised = between (char '(') (char ')')
在这里,我将解析器拆分为一个解析器,该解析器解析任意expr
-一个term
(可选),后跟一个乘法符号和一个被乘数expr
-以及一个解析单个term
s-数字的解析器,字符串和带括号的表达式.现在可以递归调用expr
了-expr
中的一个仅在您解析了term
(总是消耗输入)之后才发生,而term
中的一个仅在您解析了开头之后才发生括号.
Here I've split up the parser into one which parses an arbitrary expr
- a term
optionally followed by a multiplication symbol and a multiplicand expr
- and one which parses single term
s - numbers, strings, and parenthesised expressions. The recursive calls to expr
are OK now - the one inside expr
happens only after you've parsed a term
(which always consumes input) and the one inside term
happens only after you've parsed an opening parenthesis.
请注意,expr
具有类似列表的结构:它解析单个内容,然后可能包含许多内容.通常,您应该考虑解析器消耗了输入令牌的线性输入流,因此列表形解析器往往比树形解析器更有效也就不足为奇了.
Note that expr
has a list-like structure: it parses a single thing possibly followed by many things. In general you should think of parsers consuming a linear input stream of input tokens, so it's not surprising that list-shaped parsers tend to be more effective than tree-shaped ones.
Text.Megaparsec.Expr
模块包含用于打包此模式并使用任意优先级和固定性规则解析表达式的函数.
The Text.Megaparsec.Expr
module contains functions which package up this pattern and parse expressions with arbitrary precedence and fixity rules.
expr = makeExprParser term [[InfixR $ tok "*" $> Mult]]
这篇关于Megaparsec:无法解析递归算术字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!