问题描述
您将如何在以下任何解析器生成器中编写解析表达式语法( PEG.js , Citrus ,树梢)可以处理Python/Haskell/CoffeScript样式缩进:
How would you write a Parsing Expression Grammar in any of the following Parser Generators (PEG.js, Citrus, Treetop) which can handle Python/Haskell/CoffeScript style indentation:
尚不存在的编程语言的示例:
Examples of a not-yet-existing programming language:
square x =
x * x
cube x =
x * square x
fib n =
if n <= 1
0
else
fib(n - 2) + fib(n - 1) # some cheating allowed here with brackets
更新:请勿尝试为上述示例编写解释器.我只对缩进问题感兴趣.另一个示例可能正在解析以下内容:
Update:Don't try to write an interpreter for the examples above. I'm only interested in the indentation problem. Another example might be parsing the following:
foo
bar = 1
baz = 2
tap
zap = 3
# should yield (ruby style hashmap):
# {:foo => { :bar => 1, :baz => 2}, :tap => { :zap => 3 } }
推荐答案
纯PEG无法解析缩进.
Pure PEG cannot parse indentation.
但是 peg.js 可以.
我做了一个简单的实验(受到Ira Baxter关于作弊的评论的启发),并编写了一个简单的令牌生成器.
I did a quick-and-dirty experiment (being inspired by Ira Baxter's comment about cheating) and wrote a simple tokenizer.
有关更完整的解决方案(完整的解析器),请参见以下问题:使用PEG.js解析缩进级别
For a more complete solution (a complete parser) please see this question: Parse indentation level with PEG.js
/* Initializations */
{
function start(first, tail) {
var done = [first[1]];
for (var i = 0; i < tail.length; i++) {
done = done.concat(tail[i][1][0])
done.push(tail[i][1][1]);
}
return done;
}
var depths = [0];
function indent(s) {
var depth = s.length;
if (depth == depths[0]) return [];
if (depth > depths[0]) {
depths.unshift(depth);
return ["INDENT"];
}
var dents = [];
while (depth < depths[0]) {
depths.shift();
dents.push("DEDENT");
}
if (depth != depths[0]) dents.push("BADDENT");
return dents;
}
}
/* The real grammar */
start = first:line tail:(newline line)* newline? { return start(first, tail) }
line = depth:indent s:text { return [depth, s] }
indent = s:" "* { return indent(s) }
text = c:[^\n]* { return c.join("") }
newline = "\n" {}
depths
是一堆缩进. indent()返回一个缩进标记数组,start()解开该数组以使解析器的行为有点像流.
depths
is a stack of indentations. indent() gives back an array of indentation tokens and start() unwraps the array to make the parser behave somewhat like a stream.
peg.js 生成文本:
alpha
beta
gamma
delta
epsilon
zeta
eta
theta
iota
这些结果:
[
"alpha",
"INDENT",
"beta",
"gamma",
"INDENT",
"delta",
"DEDENT",
"DEDENT",
"epsilon",
"INDENT",
"zeta",
"DEDENT",
"BADDENT",
"eta",
"theta",
"INDENT",
"iota",
"DEDENT",
"",
""
]
此令牌生成器甚至捕获到不好的缩进.
This tokenizer even catches bad indents.
这篇关于PEG用于Python样式缩进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!