问题描述
我正在使用 ANTLR 来标记一个简单的语法,并且需要区分一个 ID:
ID : LETTER (LETTER | DIGIT)* ;片段数字:'0'..'9';片段字母:'a'..'z' |'A'..'Z' ;
和一个 RESERVED_WORD:
RESERVED_WORD : 'class' |'公共' |'静态' |'扩展' |'空' |'int' |'布尔' |'如果' |'其他' |'同时' |'返回' |'空' |'真' |'假' |'这个' |'新' |'细绳' ;
假设我在输入上运行词法分析器:
类 abc
我收到了class"和abc"的两个 ID 标记,而我希望将class"识别为 RESERVED_WORD.我怎样才能做到这一点?
每当 2 个(或更多)规则匹配相同数量的字符时,第一个定义的规则将获胜".所以,如果你在 ID
之前定义了 RESERVED_WORD
,就像这样:
RESERVED_WORD : 'class' |'公共' |'静态' |'扩展' |'空' |'int' |'布尔' |'如果' |'其他' |'同时' |'返回' |'空' |'真' |'假' |'这个' |'新' |'细绳' ;ID : LETTER (LETTER | DIGIT)* ;片段数字:'0'..'9';片段字母:'a'..'z' |'A'..'Z' ;
输入 "class"
将被标记为 RESERVED_WORD
.
请注意,创建与任何保留字匹配的单个标记没有多大意义:通常是这样完成的:
//...空:'空';真:'真';FALSE : '假;//...ID : LETTER (LETTER | DIGIT)* ;片段数字:'0'..'9';片段字母:'a'..'z' |'A'..'Z' ;
现在 "false"
将成为 FALSE
令牌,而 "falser"
将成为 ID
.>
I'm using ANTLR to tokenize a simple grammar, and need to differentiate between an ID:
ID : LETTER (LETTER | DIGIT)* ;
fragment DIGIT : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;
and a RESERVED_WORD:
RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ;
Say I run the lexer on the input:
class abc
I receive two ID tokens for "class" and "abc", while I want "class" to be recognized as a RESERVED_WORD. How can I accomplish this?
Whenever 2 (or more) rules match the same amount of characters, the one defined first will "win". So, if you define RESERVED_WORD
before ID
, like this:
RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ;
ID : LETTER (LETTER | DIGIT)* ;
fragment DIGIT : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;
The input "class"
will be tokenized as a RESERVED_WORD
.
Note that it doesn't make a lot of sense to create a single token that matches any reserved word: usually it is done like this:
// ...
NULL : 'null';
TRUE : 'true';
FALSE : 'false;
// ...
ID : LETTER (LETTER | DIGIT)* ;
fragment DIGIT : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;
Now "false"
will become a FALSE
token, and "falser"
an ID
.
这篇关于如何使用 ANTLR 区分保留字和变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!