本文介绍了如何使用 ANTLR 区分保留字和变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 ANTLR 来标记一个简单的语法,并且需要区分一个 ID:

ID : LETTER (LETTER | DIGIT)* ;片段数字:'0'..'9';片段字母:'a'..'z' |'A'..'Z' ;

和一个 RESERVED_WORD:

RESERVED_WORD : 'class' |'公共' |'静态' |'扩展' |'空' |'int' |'布尔' |'如果' |'其他' |'同时' |'返回' |'空' |'真' |'假' |'这个' |'新' |'细绳' ;

假设我在输入上运行词法分析器:

类 abc

我收到了class"和abc"的两个 ID 标记,而我希望将class"识别为 RESERVED_WORD.我怎样才能做到这一点?

解决方案

每当 2 个(或更多)规则匹配相同数量的字符时,第一个定义的规则将获胜".所以,如果你在 ID 之前定义了 RESERVED_WORD,就像这样:

RESERVED_WORD : 'class' |'公共' |'静态' |'扩展' |'空' |'int' |'布尔' |'如果' |'其他' |'同时' |'返回' |'空' |'真' |'假' |'这个' |'新' |'细绳' ;ID : LETTER (LETTER | DIGIT)* ;片段数字:'0'..'9';片段字母:'a'..'z' |'A'..'Z' ;

输入 "class" 将被标记为 RESERVED_WORD.

请注意,创建与任何保留字匹配的单个标记没有多大意义:通常是这样完成的:

//...空:'空';真:'真';FALSE : '假;//...ID : LETTER (LETTER | DIGIT)* ;片段数字:'0'..'9';片段字母:'a'..'z' |'A'..'Z' ;

现在 "false" 将成为 FALSE 令牌,而 "falser" 将成为 ID.>

I'm using ANTLR to tokenize a simple grammar, and need to differentiate between an ID:

ID              : LETTER (LETTER | DIGIT)* ;

fragment DIGIT  : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;

and a RESERVED_WORD:

RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ;

Say I run the lexer on the input:

class abc

I receive two ID tokens for "class" and "abc", while I want "class" to be recognized as a RESERVED_WORD. How can I accomplish this?

解决方案

Whenever 2 (or more) rules match the same amount of characters, the one defined first will "win". So, if you define RESERVED_WORD before ID, like this:

RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ;

ID              : LETTER (LETTER | DIGIT)* ;

fragment DIGIT  : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;

The input "class" will be tokenized as a RESERVED_WORD.

Note that it doesn't make a lot of sense to create a single token that matches any reserved word: usually it is done like this:

// ...

NULL  : 'null';
TRUE  : 'true';
FALSE : 'false;

// ...

ID              : LETTER (LETTER | DIGIT)* ;

fragment DIGIT  : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;

Now "false" will become a FALSE token, and "falser" an ID.

这篇关于如何使用 ANTLR 区分保留字和变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 06:52