问题描述
给定输入 "term >1"
,数字 (1) 和比较运算符 (>) 应该在 AST 中生成单独的节点.如何做到这一点?
Given the input "term >1"
, the number(1) and comparison operator(>) should generate seperate nodes in an AST. How can this be achieved?
在我的测试中,匹配只发生在 "c" 和 "1" 用空格分隔时,如 "term < 1
".
In my tests matching only occured if "c" and "1" where seperated with a space like so "term < 1
".
当前语法:
startExpression : orEx;
expressionLevel4
: LPARENTHESIS! orEx RPARENTHESIS! | atomicExpression;
expressionLevel3
: (fieldExpression) | expressionLevel4 ;
expressionLevel2
: (nearExpression) | expressionLevel3 ;
expressionLevel1
: (countExpression) | expressionLevel2 ;
notEx : (NOT^)? expressionLevel1;
andEx : (notEx -> notEx)
(AND? a=notEx -> ^(ANDNODE $andEx $a))*;
orEx : andEx (OR^ andEx)*;
countExpression : COUNT LPARENTHESIS WORD RPARENTHESIS RELATION NUMBERS -> ^(COUNT WORD RELATION NUMBERS);
nearExpression : NEAR LPARENTHESIS (WORD|PHRASE) MULTIPLESEPERATOR (WORD|PHRASE) MULTIPLESEPERATOR NUMBERS RPARENTHESIS -> ^(NEAR WORD* PHRASE* ^(NEARDISTANCE NUMBERS));
fieldExpression : WORD PROPERTYSEPERATOR WORD -> ^(FIELDSEARCH ^(TARGETFIELD WORD) WORD );
atomicExpression
: WORD
| PHRASE
;
fragment NUMBER : ('0'..'9');
fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9'|'*'|'?');
fragment QUOTE : ('"');
fragment LESSTHEN : '<';
fragment MORETHEN: '>';
fragment EQUAL: '=';
fragment SPACE : ('\u0009'|'\u0020'|'\u000C'|'\u00A0');
fragment UNICODENOSPACES: ('\u0021'..'\u0027'|'\u0030'..'\u0039'|'\u003B'..'\u007E'|'\u00A1'..'\uFFFF');
//fragment UNICODENOSPACES : ('\u0021'..'\u0039'|'\u003B'..'\u007E'|'\u00A1'..'\uFFFF');
LPARENTHESIS : '(';
RPARENTHESIS : ')';
AND : ('A'|'a')('N'|'n')('D'|'d');
OR : ('O'|'o')('R'|'r');
ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t');
NOT : ('N'|'n')('O'|'o')('T'|'t');
COUNT:('C'|'c')('O'|'o')('U'|'u')('N'|'n')('T'|'t');
NEAR:('N'|'n')('E'|'e')('A'|'a')('R'|'r');
PROPERTYSEPERATOR : ':';
MULTIPLESEPERATOR : ',';
WS : (SPACE) { $channel=HIDDEN; };
RELATION : LESSTHEN? MORETHEN? EQUAL?;
NUMBERS : (NUMBER)+;
PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE);
WORD : (UNICODENOSPACES)+;
推荐答案
那是因为你的 WORD
规则匹配太多:它也匹配 ">"
所以当">1"
写在一起,这 2 个字符被标记为单个 WORD
-token.
That is because your WORD
rule matches too much: it also matches ">"
so when ">1"
are written together, these 2 chars are tokenized as a single WORD
-token.
每当我不确定我的词法分析器在做什么时,我都会让解析器匹配零个或多个任何类型的标记,并打印所有标记的类型和文本:
Whenever I'm unsure what my lexer is doing, I simple let the parser match zero or more tokens of any type, and print the type and text of all tokens:
parse
: (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
;
当您让上述规则匹配您的输入 "term > 1"
时,将打印以下内容:
When you let the rule above match your input "term > 1"
, the following gets printed:
WORD 'term'
RELATION '>'
WORD '1'
和输入 "term" >1
WORD 'term'
WORD '>1'
没有办法解决这个问题:当词法分析器可以匹配 2 个(或更多)字符(WORD
规则)时,它将选择该路径而不是之前定义的规则,该规则仅匹配一个char(RELATION
规则).
There's no way around this: when the lexer can match 2 (or more) characters (the WORD
rule), it will choose that path over a rule defined before it which will only match a single char (the RELATION
rule).
还要注意你的 RELATION
规则:
Also note that your RELATION
rule:
RELATION : LESSTHEN? MORETHEN? EQUAL?;
可能匹配空字符串.确保每个词法分析器规则至少匹配 1 个字符,否则您的词法分析器可能会陷入无限循环.
potentially matches the empty string. Make sure every lexer rule matches at least 1 character, otherwise your lexer might get into an infinite loop.
最好这样做:
RELATION
: (LESSTHEN | MORETHEN)? EQUAL // '<=', '>=', or '='
| (LESSTHEN | MORETHEN) // '<' or '>'
;
这篇关于Antlr3 匹配没有空格的令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!