parsing - 黄金解析系统-可以在编程中使用什么？

我已阅读了GOLD主页(http://www.devincook.com/goldparser/)文档，FAQ和Wikipedia，以了解GOLD可能有哪些实际应用。我一直在考虑要使一种编程语言(轻松)可用于我的系统，例如SAP上的ABAP或Axapta上的X ++，但是对于我来说，这看起来并不可行，至少不容易，即使您使用GOLD。

GOLD产生的解析结果的最终用途使我无所适从-您如何处理解析结果？

编辑:一个实际的例子(描述)会很棒。

最佳答案

解析实际上包括两个阶段。第一个是“词法化”，它将原始的字符串转换成程序可以更容易理解的东西(通常称为 token )。

一个简单的例子，lex将转换为:

如果(a + b> 2)然后

到:

IF_TOKEN LEFT_PAREN IDENTIFIER(a) PLUS_SIGN IDENTIFIER(b) GREATER_THAN NUMBER(2) RIGHT_PAREN THEN_TOKEN

The parse takes that stream of tokens, and attempts to make yet more sense out of them. In this case, it would try and match up those tokens to an IF_STATEMENT. To the parse, the IF _STATEMENT may well look like this:

 IF ( BOOLEAN_EXPRESSION ) THEN

Where the result of the lexing phase is a token stream, the result of the parsing phase is a Parse Tree.

So, a parser could convert the above in to:

    if_statement
        |
        v
    boolean_expression.operator = GREATER_THAN
       |          |
       |          v
       V       numeric_constant.string="2"
    expression.operator = PLUS_SIGN
     |     |
     |     v
     v   identifier.string = "b"
   identifier.string = "a"

Here you see we have an IF_STATEMENT. An IF_STATEMENT has a single argument, which is a BOOLEAN_EXPRESSION. This was explained in some manner to the parser. When the parser is converting the token stream, it "knows" what a IF looks like, and know what a BOOLEAN_EXPRESSION looks like, so it can make the proper assignments when it sees the code.

For example, if you have just:

if (a + b) then

The parser could know that it's not a boolean expression (because the + is arithmetic, not a boolean operator) and the parse could throw an error at this point.

Next, we see that a BOOLEAN_EXPRESSION has 3 components, the operator (GREATER_THAN), and two sides, the left side and the right side.

On the left side, it points to yet another expression, the "a + b", while on the right is points to a NUMERIC_CONSTANT, in this case the string "2". Again, the parser "knows" this is a NUMERIC constant because we told it about strings of numbers. If it wasn't numbers, it would be an IDENTIFIER (like "a" and "b" are).

Note, that if we had something like:

if (a + b > "XYZ") then

That "parses" just fine (expression on the left, string constant on the right). We don't know from looking at this whether this is a valid expression or not. We don't know if "a" or "b" reference Strings or Numbers at this point. So, this is something the parser can't decided for us, can't flag as an error, as it simply doesn't know. That will happen when we evaluate (either execute or try to compile in to code) the IF statement.

If we did:

if [a > b ) then

The parser can readily see that syntax error as a problem, and will throw an error. That string of tokens doesn't look like anything it knows about.

So, the point being that when you get a complete parse tree, you have some assurance that at first cut the "code looks good". Now during execution, other errors may well come up.

To evaluate the parse tree, you just walk the tree. You'll have some code associated with the major nodes of the parse tree during the compile or evaluation part. Let's assuming that we have an interpreter.

public void execute_if_statment(ParseTreeNode node) {
    // We already know we have a IF_STATEMENT node
    Value value = evaluate_expression(node.getBooleanExpression());
    if (value.getBooleanResult() == true) {
        // we do the "then" part of the code
    }
}

public Value evaluate_expression(ParseTreeNode node) {
    Value result = null;
    if (node.isConstant()) {
        result = evaluate_constant(node);
        return result;
    }
    if (node.isIdentifier()) {
        result = lookupIdentifier(node);
        return result;
    }
    Value leftSide = evaluate_expression(node.getLeftSide());
    Value rightSide = evaluate_expression(node.getRightSide());
    if (node.getOperator() == '+') {
        if (!leftSide.isNumber() || !rightSide.isNumber()) {
            throw new RuntimeError("Must have numbers for adding");
        }
        int l = leftSide.getIntValue();
        int r = rightSide.getIntValue();
        int sum = l + r;
        return new Value(sum);
    }
    if (node.getOperator() == '>') {
        if (leftSide.getType() != rightSide.getType()) {
            throw new RuntimeError("You can only compare values of the same type");
        }
        if (leftSide.isNumber()) {
            int l = leftSide.getIntValue();
            int r = rightSide.getIntValue();
            boolean greater = l > r;
            return new Value(greater);
        } else {
            // do string compare instead
        }
    }
}

因此，您可以看到我们这里有一个递归评估器。您将看到我们如何检查运行时类型并执行基本评估。

将会发生的是execute_if_statement将评估它的主表达式。即使我们只想在解析中使用BOOLEAN_EXPRESION，出于我们的目的，所有表达式大体相同。因此，execute_if_statement调用评价表达式。

在我们的系统中，所有表达式都有一个运算符和一个左侧和右侧。表达式的每一面也都是表达式，因此您可以看到我们如何立即尝试对它们进行评估并获得其真实价值。需要注意的是，如果表达式由CONSTANT组成，那么我们仅返回常量值，如果它是一个标识符，则将其查找为变量(这将是抛出“我找不到”的好地方变量'a'“消息)，否则我们将回到左侧/右侧。

我希望您一旦从解析器获得 token 流，就能看到一个简单的评估器如何工作。请注意，在评估过程中，语言的主要元素是如何就位的，否则我们将出现语法错误，并且永远不会进入此阶段。我们可以简单地期望“知道”，例如，当我们有一个PLUS运算符时，我们将拥有左右两个表达式。或者，当我们执行IF语句时，我们已经有一个 bool(boolean) 表达式可以求值。解析是为我们带来的繁重工作。

开始使用新语言可能是一个挑战，但是您会发现，一旦开始滚动，其余的就变得非常简单，最终一切都变得很神奇。

注意，请原谅格式，但是下划线会使事情搞砸了-我希望它仍然很清楚。