本文介绍了使用ANTLR来获取标识符和函数名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用和理解ANTLR,这是新的我。我的目的是读取C语言编写的源$ C ​​$ C文件,并从中提取标识符(变量和函数名)。

I'm trying to use and understand AntLR, this is new to me. My purpose is to read a source code file written in C and extract from it the identifiers (variables and function names).

在我的C语法(文件的 C.g4 )认为:

In my C grammar (file C.g4) consider:

identifierList
    :   Identifier
    |   identifierList Comma Identifier
    ;
Identifier
    :   IdentifierNondigit
        (   IdentifierNondigit
        |   Digit
        )*
    ;

在代解析器和听众的创建我自己的监听器标识符列表。

After generation of parser and listener I create my own listener to the identifierList.

注意MyCListener类扩展CBaseListener:

Note that MyCListener class extends CBaseListener:

public class MyCListener extends CBaseListener {


@Override
public void enterIdentifierList(CParser.IdentifierListContext ctx) {
    List<ParseTree> children = ctx.children;
    for (ParseTree parseTree : children) {
        System.out.println(parseTree.getText());
    }

}

然后,我有这个在主类:

Then I have this in main class:

 String fileurl = "C:/example.c";

 CLexer lexer;
 try {
       lexer = new CLexer(new ANTLRFileStream(fileurl));
       CommonTokenStream tokens = new CommonTokenStream(lexer);
       CParser parser = new CParser(tokens);

       CParser.IdentifierListContext identifierContext = parser.identifierList();
       ParseTreeWalker walker = new ParseTreeWalker();
       MyCListener listener = new MyCListener();
       walker.walk(listener, identifierContext);

 } catch (IOException ex) {
       Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
 }

在哪里example.c是:

Where example.c is:

int main() {

// this is C

 int i=0; // i is int
 /* double j=0.0;
    C
 */
}

我是什么做错了吗?
也许我没有写正确MyCListener或标识符列表是不是我需要听......真的不知道。我很抱歉,但我甚至不知道我的输出,为什么会出现词法错误:

What am I doing wrong?Maybe I didn't write MyCListener properly, or identifierList is not what I need to listen... Really don't know. I'm sorry, but I didn't even understand my output, why is there a lexical error?:

line 3:4 mismatched input '(' expecting {<EOF>, ','}
main
(
)
{
int
i
=
0
;
}

正如你所见,我对此很困惑。有人可以帮我吗?请...

As you see, I'm very confused about this. Can somebody help me ? Please...

推荐答案

通过这一行:

CParser.IdentifierListContext identifierContext = parser.identifierList();

你试图解析您的整个输入作为标识符列表。但是你的输入不只是这一点。

you're trying to parse your entire input as an identifierList. But your input isn't just that.

假设你正在使用的,尽量让在语法(这是规则的切入点解析器启动 compilationUnit

Assuming you're using the C.g4 from the ANTLR4 Github repository, try to let the parser start at the entry point of the grammar (which is the rule compilationUnit):

MyCListener listener = new MyCListener();
ParseTreeWalker.DEFAULT.walk(listener, parser.compilationUnit());

修改

下面是一个简单的演示:

EDIT

Here's a quick demo:

public class Main {

    public static void main(String[] args) throws Exception {

        final List<String> identifiers = new ArrayList<String>();

        String source = "int main() {\n" +
                "\n" +
                "// this is C\n" +
                "\n" +
                " int i=0; // i is int\n" +
                " /* double j=0.0;\n" +
                "    C\n" +
                " */\n" +
                "}";

        CLexer lexer = new CLexer(new ANTLRInputStream(source));
        CParser parser = new CParser(new CommonTokenStream(lexer));

        ParseTreeWalker.DEFAULT.walk(new CBaseListener(){

            @Override
            public void enterDirectDeclarator(@NotNull CParser.DirectDeclaratorContext ctx) {
                if (ctx.Identifier() != null) {
                    identifiers.add(ctx.Identifier().getText());
                }
            }

            // Perhaps override other rules that use `Identifier`

        }, parser.compilationUnit());

        System.out.println("identifiers -> " + identifiers);
    }
}

这将打印:

identifiers -> [main, i]

这篇关于使用ANTLR来获取标识符和函数名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 15:16