问题描述
我编写了一个快速的attoparsec解析器来遍历一个aspx文件并删除所有的样式属性,并且它的工作正常,除了其中的一部分,我无法弄清楚如何使它匹配>
以下是我的:
anyTill = manyTill anyChar
anyBetween start end = start *> anyTill结束
styleWithQuotes = anyBetween(stringCIstyle = \)(stringCI\)
styleWithoutQuotes = anyBetween(stringCIstyle =)(stringCI< ; |>>)
everythingButStyles = manyTill anyChar(styleWithQuotes< |> styleWithoutQuotes)< |> many1 anyChar
我知道这部分是因为我在everythingButStyles中使用manyTill,这就是我主动删除所有样式的东西,但在 styleWithoutQuotes
我需要它匹配>作为结束,但不消耗它,在parsec中,我会刚刚完成 lookAhead>
但我无法在attoparsec中做到这一点。
同时, combinator已添加到,它只是查看下一个字节(如果有的话)。由于 ByteString
有一个 Monoid
实例, Parser ByteString
是一个 MonadPlus
,您可以使用
lookGreater = do
mbw< - peekWord8
case mbw of
只需62 - >返回>
_ - > mzero
(62是'>'
)找到'>'
而不消耗它或失败。
I wrote a quick attoparsec parser to walk an aspx file and drop all the style attributes, and it's working fine except for one piece of it where I can't figure out how to make it succeed on matching >
without consuming it.
Here's what I have:
anyTill = manyTill anyChar
anyBetween start end = start *> anyTill end
styleWithQuotes = anyBetween (stringCI "style=\"") (stringCI "\"")
styleWithoutQuotes = anyBetween (stringCI "style=") (stringCI " " <|> ">")
everythingButStyles = manyTill anyChar (styleWithQuotes <|> styleWithoutQuotes) <|> many1 anyChar
I understand it's partially because of how I'm using manyTill in everythingButStyles, that's how I am actively dropping all the styles stuff on the ground, but in styleWithoutQuotes
I need it to match ">" as an end, but not consume it, in parsec I would have just done lookAhead ">"
but I can't do that in attoparsec.
Meanwhile, the lookAhead
combinator was added to attoparsec, so now one can just use lookAhead (char '>')
or lookAhead (string ">")
to achieve the goal.
Below is a workaround from the times before its introduction.
You can build your non-consuming parser using peekWord8
, which just looks at the next byte (if any). Since ByteString
has a Monoid
instance, Parser ByteString
is a MonadPlus
, and you can use
lookGreater = do
mbw <- peekWord8
case mbw of
Just 62 -> return ">"
_ -> mzero
(62 is the code point of '>'
) to either find a '>'
without consuming it or fail.
这篇关于如何让Attoparsec解析器在不消耗的情况下成功(如parsec lookAhead)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!