PEG用于Python样式缩进

PEG用于Python样式缩进

本文介绍了PEG用于Python样式缩进的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您将如何在以下任何解析器生成器中编写解析表达式语法( PEG.js Citrus 树梢)可以处理Python/Haskell/CoffeScript样式缩进:

How would you write a Parsing Expression Grammar in any of the following Parser Generators (PEG.js, Citrus, Treetop) which can handle Python/Haskell/CoffeScript style indentation:

尚不存在的编程语言的示例:

Examples of a not-yet-existing programming language:

square x =
    x * x


cube x =
    x * square x


fib n =
  if n <= 1
    0
  else
    fib(n - 2) + fib(n - 1) # some cheating allowed here with brackets

更新:请勿尝试为上述示例编写解释器.我只对缩进问题感兴趣.另一个示例可能正在解析以下内容:

Update:Don't try to write an interpreter for the examples above. I'm only interested in the indentation problem. Another example might be parsing the following:

foo
  bar = 1
  baz = 2
tap
  zap = 3

# should yield (ruby style hashmap):
# {:foo => { :bar => 1, :baz => 2}, :tap => { :zap => 3 } }

推荐答案

纯PEG无法解析缩进.

Pure PEG cannot parse indentation.

但是 peg.js 可以.

我做了一个简单的实验(受到Ira Baxter关于作弊的评论的启发),并编写了一个简单的令牌生成器.

I did a quick-and-dirty experiment (being inspired by Ira Baxter's comment about cheating) and wrote a simple tokenizer.

有关更完整的解决方案(完整的解析器),请参见以下问题:使用PEG.js解析缩进级别

For a more complete solution (a complete parser) please see this question: Parse indentation level with PEG.js

/* Initializations */
{
  function start(first, tail) {
    var done = [first[1]];
    for (var i = 0; i < tail.length; i++) {
      done = done.concat(tail[i][1][0])
      done.push(tail[i][1][1]);
    }
    return done;
  }

  var depths = [0];

  function indent(s) {
    var depth = s.length;

    if (depth == depths[0]) return [];

    if (depth > depths[0]) {
      depths.unshift(depth);
      return ["INDENT"];
    }

    var dents = [];
    while (depth < depths[0]) {
      depths.shift();
      dents.push("DEDENT");
    }

    if (depth != depths[0]) dents.push("BADDENT");

    return dents;
  }
}

/* The real grammar */
start   = first:line tail:(newline line)* newline? { return start(first, tail) }
line    = depth:indent s:text                      { return [depth, s] }
indent  = s:" "*                                   { return indent(s) }
text    = c:[^\n]*                                 { return c.join("") }
newline = "\n"                                     {}

depths是一堆缩进. indent()返回一个缩进标记数组,start()解开该数组以使解析器的行为有点像流.

depths is a stack of indentations. indent() gives back an array of indentation tokens and start() unwraps the array to make the parser behave somewhat like a stream.

peg.js 生成文本:

alpha
  beta
  gamma
    delta
epsilon
    zeta
  eta
theta
  iota

这些结果:

[
   "alpha",
   "INDENT",
   "beta",
   "gamma",
   "INDENT",
   "delta",
   "DEDENT",
   "DEDENT",
   "epsilon",
   "INDENT",
   "zeta",
   "DEDENT",
   "BADDENT",
   "eta",
   "theta",
   "INDENT",
   "iota",
   "DEDENT",
   "",
   ""
]

此令牌生成器甚至捕获到不好的缩进.

This tokenizer even catches bad indents.

这篇关于PEG用于Python样式缩进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-03 18:06