本文介绍了互动式蚂蚁的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用antlr编写一种简单的交互式(使用System.in作为源代码)语言,但我遇到了一些问题.我在网上找到的示例都是按行周期使用的,例如:

I'm trying to write a simple interactive (using System.in as source) language using antlr, and I have a few problems with it. The examples I've found on the web are all using a per line cycle, e.g.:

while(readline)
  result = parse(line)
  doStuff(result)

但是,如果我写的是pascal/smtp/etc之类的东西,并且带有第一行",看起来像X需求,该怎么办?我知道可以在doStuff中检查它,但是从逻辑上讲,它是语法的一部分.

But what if I'm writing something like pascal/smtp/etc, with a "first line" looks like X requirment? I know it can be checked in doStuff, but I think logically it is part of the syntax.

或者如果命令分成多行怎么办?我可以尝试

Or what if a command is split into multiple lines? I can try

while(readline)
  lines.add(line)
  try
    result = parse(lines)
    lines = []
    doStuff(result)
  catch
    nop

但是与此同时,我也隐藏了真正的错误.

But with this I'm also hiding real errors.

或者我可以每次都重新分析所有行,但是:

Or I could reparse all lines everytime, but:

  1. 那会很慢
  2. 有些指令我不想重复运行

这可以使用ANTLR来完成吗?如果不能,可以通过其他方式来完成吗?

Can this be done with ANTLR, or if not, with something else?

推荐答案

或者我可以每次都重新分析所有行,但是:

Or I could reparse all lines everytime, but:

会很慢 有些指令我不想运行两次 可以使用ANTLR来完成此操作吗?如果不能,可以使用其他方式完成此操作吗?

it will be slow there are instructions I don't want to run twice Can this be done with ANTLR, or if not, with something else?

是的,ANTLR可以做到这一点.也许不是开箱即用,但是有了一些自定义代码,这肯定是可能的.您也不需要为此重新解析整个令牌流.

Yes, ANTLR can do this. Perhaps not out of the box, but with a bit of custom code, it sure is possible. You also don't need to re-parse the entire token stream for it.

假设您要逐行解析一种非常简单的语言,其中每行要么是program声明,或者是uses声明,或者是statement.

Let's say you want to parse a very simple language line by line that where each line is either a program declaration, or a uses declaration, or a statement.

它应始终以program声明开头,后跟零个或多个uses声明,后跟零个或多个statement. uses声明不能在statement之后,并且不能超过一个program声明.

It should always start with a program declaration, followed by zero or more uses declarations followed by zero or more statements. uses declarations cannot come after statements and there can't be more than one program declaration.

为简单起见,statement只是一个简单的赋值:a = 4b = a.

For simplicity, a statement is just a simple assignment: a = 4 or b = a.

这种语言的ANTLR语法可能看起来像这样:

An ANTLR grammar for such a language could look like this:

grammar REPL;

parse
  :  programDeclaration EOF
  |  usesDeclaration EOF
  |  statement EOF
  ;

programDeclaration
  :  PROGRAM ID
  ;

usesDeclaration
  :  USES idList
  ;

statement
  :  ID '=' (INT | ID)
  ;

idList
  :  ID (',' ID)*
  ;

PROGRAM : 'program';
USES    : 'uses';
ID      : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT     : '0'..'9'+;
SPACE   : (' ' | '\t' | '\r' | '\n') {skip();};

但是,我们当然需要添加一些检查.另外,默认情况下,解析器会在其构造函数中使用令牌流,但是由于我们计划逐行在解析器中滴入令牌,因此我们需要在解析器中创建一个新的构造函数.您可以通过将自定义成员分别放在@parser::members { ... }@lexer::members { ... }部分中来将它们添加到词法分析器或解析器类中.我们还将添加几个布尔标志,以跟踪program声明是否已经发生以及是否允许uses声明.最后,我们将添加一个process(String source)方法,该方法为每行新代码创建一个词法分析器,该词法分析器将被馈送到解析器.

But, we'll need to add a couple of checks of course. Also, by default, a parser takes a token stream in its constructor, but since we're planning to trickle tokens in the parser line-by-line, we'll need to create a new constructor in our parser. You can add custom members in your lexer or parser classes by putting them in a @parser::members { ... } or @lexer::members { ... } section respectively. We'll also add a couple of boolean flags to keep track whether the program declaration has happened already and if uses declarations are allowed. Finally, we'll add a process(String source) method which, for each new line, creates a lexer which gets fed to the parser.

所有这些看起来像:

@parser::members {

  boolean programDeclDone;
  boolean usesDeclAllowed;

  public REPLParser() {
    super(null);
    programDeclDone = false;
    usesDeclAllowed = true;
  }

  public void process(String source) throws Exception {
    ANTLRStringStream in = new ANTLRStringStream(source);
    REPLLexer lexer = new REPLLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    super.setTokenStream(tokens);
    this.parse(); // the entry point of our parser
  } 
}

现在在语法中,我们将检查几个 .并且在解析了某个声明或语句之后,我们希望从那时开始翻转某些布尔标志以允许或禁止声明.这些布尔标志的翻转是通过每个规则的@after { ... }部分执行的(毫不奇怪),在匹配该解析器规则的标记后 .

Now inside our grammar, we're going to check through a couple of gated semantic predicates if we're parsing declarations in the correct order. And after parsing a certain declaration, or statement, we'll want to flip certain boolean flags to allow- or disallow declaration from then on. The flipping of these boolean flags is done through each rule's @after { ... } section that gets executed (not surprisingly) after the tokens from that parser rule are matched.

您现在的最终语法文件如下所示(包括一些用于调试目的的System.out.println):

Your final grammar file now looks like this (including some System.out.println's for debugging purposes):

grammar REPL;

@parser::members {

  boolean programDeclDone;
  boolean usesDeclAllowed;

  public REPLParser() {
    super(null);
    programDeclDone = false;
    usesDeclAllowed = true;
  }

  public void process(String source) throws Exception {
    ANTLRStringStream in = new ANTLRStringStream(source);
    REPLLexer lexer = new REPLLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    super.setTokenStream(tokens);
    this.parse();
  } 
}

parse
  :  programDeclaration EOF
  |  {programDeclDone}? (usesDeclaration | statement) EOF
  ;

programDeclaration
@after{
  programDeclDone = true;
}
  :  {!programDeclDone}? PROGRAM ID {System.out.println("\t\t\t program <- " + $ID.text);}
  ;

usesDeclaration
  :  {usesDeclAllowed}? USES idList {System.out.println("\t\t\t uses <- " + $idList.text);}
  ;

statement
@after{
  usesDeclAllowed = false; 
}
  :  left=ID '=' right=(INT | ID) {System.out.println("\t\t\t " + $left.text + " <- " + $right.text);}
  ;

idList
  :  ID (',' ID)*
  ;

PROGRAM : 'program';
USES    : 'uses';
ID      : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT     : '0'..'9'+;
SPACE   : (' ' | '\t' | '\r' | '\n') {skip();};

可以通过以下课程进行测试:

which can be tested wit the following class:

import org.antlr.runtime.*;
import java.util.Scanner;

public class Main {
    public static void main(String[] args) throws Exception {
        Scanner keyboard = new Scanner(System.in);
        REPLParser parser = new REPLParser();
        while(true) {
            System.out.print("\n> ");
            String input = keyboard.nextLine();
            if(input.equals("quit")) {
                break;
            }
            parser.process(input);
        }
        System.out.println("\nBye!");
    }
}

要运行此测试类,请执行以下操作:

To run this test class, do the following:

# generate a lexer and parser:
java -cp antlr-3.2.jar org.antlr.Tool REPL.g

# compile all .java source files:
javac -cp antlr-3.2.jar *.java

# run the main class on Windows:
java -cp .;antlr-3.2.jar Main 
# or on Linux/Mac:
java -cp .:antlr-3.2.jar Main


如您所见,您只能声明一次program:

> program A
                         program <- A

> program B
line 1:0 rule programDeclaration failed predicate: {!programDeclDone}?


uses不能在statement之后出现:


uses cannot come after statements:

> program X
                         program <- X

> uses a,b,c
                         uses <- a,b,c

> a = 666
                         a <- 666

> uses d,e
line 1:0 rule usesDeclaration failed predicate: {usesDeclAllowed}?


,您必须以program声明开头:


and you must start with a program declaration:

> uses foo
line 1:0 rule parse failed predicate: {programDeclDone}?

这篇关于互动式蚂蚁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 21:03