提高FParsec解析器的可读性

提高FParsec解析器的可读性

本文介绍了提高FParsec解析器的可读性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用C#完成的手写CSS解析器,该解析器变得难以管理,并且正在FParsec中尝试使其更易于维护.这是一个片段,用于解析使用正则表达式制作的css选择器元素:

I have a hand-written CSS parser done in C# which is getting unmanageable and was trying to do it i FParsec to make it more mantainable. Here's a snippet that parses a css selector element made with regexes:

var tagRegex = @"(?<Tag>(?:[a-zA-Z][_\-0-9a-zA-Z]*|\*))";
var idRegex = @"(?:#(?<Id>[a-zA-Z][_\-0-9a-zA-Z]*))";
var classesRegex = @"(?<Classes>(?:\.[a-zA-Z][_\-0-9a-zA-Z]*)+)";
var pseudoClassRegex = @"(?::(?<PseudoClass>link|visited|hover|active|before|after|first-line|first-letter))";
var selectorRegex = new Regex("(?:(?:" + tagRegex + "?" + idRegex + ")|" +
                                 "(?:" + tagRegex + "?" + classesRegex + ")|" +
                                  tagRegex + ")" +
                               pseudoClassRegex + "?");

var m = selectorRegex.Match(str);

if (m.Length != str.Length) {
    cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
    return null;
}

string tagName = m.Groups["Tag"].Value;

string pseudoClassString = m.Groups["PseudoClass"].Value;
CssPseudoClass pseudoClass;
if (pseudoClassString.IsEmpty()) {
    pseudoClass = CssPseudoClass.None;
} else {
    switch (pseudoClassString.ToLower()) {
        case "link":
            pseudoClass = CssPseudoClass.Link;
            break;
        case "visited":
            pseudoClass = CssPseudoClass.Visited;
            break;
        case "hover":
            pseudoClass = CssPseudoClass.Hover;
            break;
        case "active":
            pseudoClass = CssPseudoClass.Active;
            break;
        case "before":
            pseudoClass = CssPseudoClass.Before;
            break;
        case "after":
            pseudoClass = CssPseudoClass.After;
            break;
        case "first-line":
            pseudoClass = CssPseudoClass.FirstLine;
            break;
        case "first-letter":
            pseudoClass = CssPseudoClass.FirstLetter;
            break;
        default:
            cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
            return null;
    }
}

string cssClassesString = m.Groups["Classes"].Value;
string[] cssClasses = cssClassesString.IsEmpty() ? EmptyArray<string>.Instance : cssClassesString.Substring(1).Split('.');
allCssClasses.AddRange(cssClasses);

return new CssSelectorElement(
    tagName.ToLower(),
    cssClasses,
    m.Groups["Id"].Value,
    pseudoClass);

我的第一次尝试产生了这个结果:

My first attempt yielded this:

type CssPseudoClass =
    | None = 0
    | Link = 1
    | Visited = 2
    | Hover = 3
    | Active = 4
    | Before = 5
    | After = 6
    | FirstLine = 7
    | FirstLetter = 8

type CssSelectorElement =
    { Tag : string
      Id : string
      Classes : string list
      PseudoClass : CssPseudoClass }
with
    static member Default =
        { Tag = "";
          Id = "";
          Classes = [];
          PseudoClass = CssPseudoClass.None; }

open FParsec

let ws = spaces
let str = skipString
let strWithResult str result = skipString str >>. preturn result

let identifier =
    let isIdentifierFirstChar c = isLetter c || c = '-'
    let isIdentifierChar c = isLetter c || isDigit c || c = '_' || c = '-'
    optional (str "-") >>. many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier"

let stringFromOptional strOption =
    match strOption with
    | Some(str) -> str
    | None -> ""

let pseudoClassFromOptional pseudoClassOption =
    match pseudoClassOption with
    | Some(pseudoClassOption) -> pseudoClassOption
    | None -> CssPseudoClass.None

let parseCssSelectorElement =
    let tag = identifier <?> "tagName"
    let id = str "#" >>. identifier <?> "#id"
    let classes = many1 (str "." >>. identifier) <?> ".className"
    let parseCssPseudoClass =
        choiceL [ strWithResult "link" CssPseudoClass.Link;
                  strWithResult "visited" CssPseudoClass.Visited;
                  strWithResult "hover" CssPseudoClass.Hover;
                  strWithResult "active" CssPseudoClass.Active;
                  strWithResult "before" CssPseudoClass.Before;
                  strWithResult "after" CssPseudoClass.After;
                  strWithResult "first-line" CssPseudoClass.FirstLine;
                  strWithResult "first-letter" CssPseudoClass.FirstLetter]
                 "pseudo-class"
    // (tag?id|tag?classes|tag)pseudoClass?
    pipe2 ((pipe2 (opt tag)
                  id
                  (fun tag id ->
                      { CssSelectorElement.Default with
                          Tag = stringFromOptional tag;
                          Id = id })) |> attempt
           <|>
           (pipe2 (opt tag)
                  classes
                  (fun tag classes ->
                      { CssSelectorElement.Default with
                          Tag = stringFromOptional tag;
                          Classes = classes })) |> attempt
           <|>
           (tag |>> (fun tag -> { CssSelectorElement.Default with Tag = tag })))
           (opt (str ":" >>. parseCssPseudoClass) |> attempt)
           (fun selectorElem pseudoClass -> { selectorElem with PseudoClass = pseudoClassFromOptional pseudoClass })

但是我不太喜欢它的形状.我原本希望提出一些更容易理解的东西,但是解析(tag?id | tag?classes | tag)pseudoClass的部分呢?带有几个pipe2的尝试确实很糟糕.

But I'm not really liking how it's shaping up. I was expecting to come up with something easier to understand, but the part parsing (tag?id|tag?classes|tag)pseudoClass? with a few pipe2's and attempt's is really bad.

请具有在FParsec方面有更多经验的人来教育我实现此目标的更好方法吗?我正在考虑尝试使用FSLex/Yacc或Boost.Spirit而不是FParsec来查看是否可以为他们提供更好的代码

Came someone with more experience in FParsec educate me on better ways to accomplish this?I'm thinking on trying FSLex/Yacc or Boost.Spirit instead of FParsec is see if I can come up with nicer code with them

推荐答案

正如Mauricio所说,如果您发现自己在FParsec解析器中重复代码,则始终可以将公共部分分解为变量或自定义组合器.这是组合器库的巨大优势之一.

As Mauricio said, if you find yourself repeating code in an FParsec parser, you can always factor out the common parts into a variable or custom combinator. This is one of the great advantages of combinator libraries.

但是,在这种情况下,您还可以通过重新组织语法器来简化和优化解析器.例如,您可以将parseCssSelectorElement解析器的下半部分替换为

However, in this case you could also simplify and optimize the parser by reorganizing the grammer a bit. You could, for example, replace the lower half of the parseCssSelectorElement parser with

let defSel = CssSelectorElement.Default

let pIdSelector = id |>> (fun str -> {defSel with Id = str})
let pClassesSelector = classes |>> (fun strs -> {defSel with Classes = strs})

let pSelectorMain =
     choice [pIdSelector
             pClassesSelector
             pipe2 tag (pIdSelector <|> pClassesSelector <|>% defSel)
                   (fun tagStr sel -> {sel with Tag = tagStr})]

pipe2 pSelectorMain (opt (str ":" >>. parseCssPseudoClass))
      (fun sel optPseudo ->
           match optPseudo with
           | None -> sel
           | Some pseudo -> {sel with PseudoClass = pseudo})

顺便说一句,如果您想解析大量的字符串常量,那么使用基于字典的解析器(如

By the way, if you want to parse a large number of string constants, it's more efficient to use a dictionary based parsers, like

let pCssPseudoClass : Parser<CssPseudoClass,unit> =
    let pseudoDict = dict ["link", CssPseudoClass.Link
                           "visited", CssPseudoClass.Visited
                           "hover", CssPseudoClass.Hover
                           "active", CssPseudoClass.Active
                           "before", CssPseudoClass.Before
                           "after", CssPseudoClass.After
                           "first-line", CssPseudoClass.FirstLine
                           "first-letter", CssPseudoClass.FirstLetter]
    fun stream ->
        let reply = identifier stream
        if reply.Status <> Ok then Reply(reply.Status, reply.Error)
        else
            let mutable pseudo = CssPseudoClass.None
            if pseudoDict.TryGetValue(reply.Result, &pseudo) then Reply(pseudo)
            else // skip to beginning of invalid pseudo class
                stream.Skip(-reply.Result.Length)
                Reply(Error, messageError "unknown pseudo class")

这篇关于提高FParsec解析器的可读性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-19 11:50