问题描述
我试图使用和理解ANTLR,这是新的我。我的目的是读取C语言编写的源$ C $ C文件,并从中提取标识符(变量和函数名)。
I'm trying to use and understand AntLR, this is new to me. My purpose is to read a source code file written in C and extract from it the identifiers (variables and function names).
在我的C语法(文件的 C.g4 )认为:
In my C grammar (file C.g4) consider:
identifierList
: Identifier
| identifierList Comma Identifier
;
Identifier
: IdentifierNondigit
( IdentifierNondigit
| Digit
)*
;
在代解析器和听众的创建我自己的监听器标识符列表。
After generation of parser and listener I create my own listener to the identifierList.
注意MyCListener类扩展CBaseListener:
Note that MyCListener class extends CBaseListener:
public class MyCListener extends CBaseListener {
@Override
public void enterIdentifierList(CParser.IdentifierListContext ctx) {
List<ParseTree> children = ctx.children;
for (ParseTree parseTree : children) {
System.out.println(parseTree.getText());
}
}
然后,我有这个在主类:
Then I have this in main class:
String fileurl = "C:/example.c";
CLexer lexer;
try {
lexer = new CLexer(new ANTLRFileStream(fileurl));
CommonTokenStream tokens = new CommonTokenStream(lexer);
CParser parser = new CParser(tokens);
CParser.IdentifierListContext identifierContext = parser.identifierList();
ParseTreeWalker walker = new ParseTreeWalker();
MyCListener listener = new MyCListener();
walker.walk(listener, identifierContext);
} catch (IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
在哪里example.c是:
Where example.c is:
int main() {
// this is C
int i=0; // i is int
/* double j=0.0;
C
*/
}
我是什么做错了吗?
也许我没有写正确MyCListener或标识符列表是不是我需要听......真的不知道。我很抱歉,但我甚至不知道我的输出,为什么会出现词法错误:
What am I doing wrong?Maybe I didn't write MyCListener properly, or identifierList is not what I need to listen... Really don't know. I'm sorry, but I didn't even understand my output, why is there a lexical error?:
line 3:4 mismatched input '(' expecting {<EOF>, ','}
main
(
)
{
int
i
=
0
;
}
正如你所见,我对此很困惑。有人可以帮我吗?请...
As you see, I'm very confused about this. Can somebody help me ? Please...
推荐答案
通过这一行:
CParser.IdentifierListContext identifierContext = parser.identifierList();
你试图解析您的整个输入作为标识符列表
。但是你的输入不只是这一点。
you're trying to parse your entire input as an identifierList
. But your input isn't just that.
假设你正在使用的,尽量让在语法(这是规则的切入点解析器启动 compilationUnit
)
Assuming you're using the C.g4
from the ANTLR4 Github repository, try to let the parser start at the entry point of the grammar (which is the rule compilationUnit
):
MyCListener listener = new MyCListener();
ParseTreeWalker.DEFAULT.walk(listener, parser.compilationUnit());
修改
下面是一个简单的演示:
EDIT
Here's a quick demo:
public class Main {
public static void main(String[] args) throws Exception {
final List<String> identifiers = new ArrayList<String>();
String source = "int main() {\n" +
"\n" +
"// this is C\n" +
"\n" +
" int i=0; // i is int\n" +
" /* double j=0.0;\n" +
" C\n" +
" */\n" +
"}";
CLexer lexer = new CLexer(new ANTLRInputStream(source));
CParser parser = new CParser(new CommonTokenStream(lexer));
ParseTreeWalker.DEFAULT.walk(new CBaseListener(){
@Override
public void enterDirectDeclarator(@NotNull CParser.DirectDeclaratorContext ctx) {
if (ctx.Identifier() != null) {
identifiers.add(ctx.Identifier().getText());
}
}
// Perhaps override other rules that use `Identifier`
}, parser.compilationUnit());
System.out.println("identifiers -> " + identifiers);
}
}
这将打印:
identifiers -> [main, i]
这篇关于使用ANTLR来获取标识符和函数名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!