ANTLR 中的浮点文字和范围参数

本文介绍了ANTLR 中的浮点文字和范围参数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发语言 D 的解析器，当我尝试添加切片"运算符规则时遇到了麻烦.你可以找到它的 ANTLR 语法此处.基本上问题在于，如果词法分析器遇到这样的字符串:1..2"，它会完全丢失，最终成为单个浮点值，因此是像a[10..2"这样的字符串的 postfixExpression 规则.11]" 最终成为带有 ExpLiteralReal 参数的 ExpArrIndex 对象.有人可以解释数字文字到底有什么问题吗?(据我所知它在这些令牌周围的某个地方失败了)

I'm working on a parser for the language D and I ran in to trouble when I tried to add the "slice" operator rule. You can find the ANTLR grammar for it here.Basically the problem is that if the lexer encounters a string like this: "1..2" it gets completely lost, and it ends up being as a single float value and therefore the postfixExpression rule for a string like "a[10..11]" ends up being a ExpArrIndex object with a ExpLiteralReal argument. Can somebody explain what is exactly wrong with the numeric literals? (as far as I understand it fails somewhere around these tokens)

推荐答案

当你遇到一个".." 在浮动规则中.您需要在词法分析器中覆盖两个方法才能完成此操作.

You can do that by emitting two tokens (an Int and Range token) when you encounter a ".." inside a float rule. You need to override two methods in your lexer to accomplish this.

带有一小部分 Dee 语法的演示:

A demo with a small part of your Dee grammar:

grammar Dee;

@lexer::members {

  java.util.Queue<Token> tokens = new java.util.LinkedList<Token>();

  public void offer(int ttype, String ttext) {
    this.emit(new CommonToken(ttype, ttext));
  }

  @Override
  public void emit(Token t) {
    state.token = t;
    tokens.offer(t);
  }

  @Override
  public Token nextToken() {
    super.nextToken();
    return tokens.isEmpty() ? Token.EOF_TOKEN : tokens.poll();
  }
}
parse
 : (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
 ;

Range
 : '..'
 ;

IntegerLiteral
 : Integer IntSuffix?
 ;

FloatLiteral
 : Float ImaginarySuffix?
 ;

// skipping
Space
 : ' ' {skip();}
 ;

// fragments
fragment Float
 : d=DecimalDigits ( options {greedy = true; } : FloatTypeSuffix
                   | '..' {offer(IntegerLiteral, $d.text); offer(Range, "..");}
                   | '.' DecimalDigits DecimalExponent?
                   )
 | '.' DecimalDigits DecimalExponent?
 ;

fragment DecimalExponent : 'e' | 'E' | 'e+' | 'E+' | 'e-' | 'E-' DecimalDigits;
fragment DecimalDigits   : ('0'..'9'|'_')+ ;
fragment FloatTypeSuffix : 'f' | 'F' | 'L';
fragment ImaginarySuffix : 'i';
fragment IntSuffix       : 'L'|'u'|'U'|'Lu'|'LU'|'uL'|'UL' ;
fragment Integer         : Decimal| Binary| Octal| Hexadecimal ;
fragment Decimal         : '0' | '1'..'9' (DecimalDigit | '_')* ;
fragment Binary          : ('0b' | '0B') ('0' | '1' | '_')+ ;
fragment Octal           : '0' (OctalDigit | '_')+ ;
fragment Hexadecimal     : ('0x' | '0X') (HexDigit | '_')+;
fragment DecimalDigit    : '0'..'9' ;
fragment OctalDigit      : '0'..'7' ;
fragment HexDigit        : ('0'..'9'|'a'..'f'|'A'..'F') ;

在课堂上测试语法:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    DeeLexer lexer = new DeeLexer(new ANTLRStringStream("1..2 .. 33.33 ..21.0"));
    DeeParser parser = new DeeParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

当您运行 Main 时，会产生以下输出:

And when you run Main, the following output is produced:

IntegerLiteral  '1'
Range           '..'
IntegerLiteral  '2'
Range           '..'
FloatLiteral    '33.33'
Range           '..'
FloatLiteral    '21.0'

编辑

是的，正如您在评论中指出的，词法分析器规则只能发出 1 个标记.但是，正如你自己已经尝试过的那样，语义谓词确实可以用来强制词法分析器在字符流中向前看，以确保实际上有一个 ".."在尝试匹配 FloatLiteral 之前的 IntegerLiteral 标记之后.

EDIT

Yeah, as you indicated in the comments, a lexer rule can only emit 1 single token. But, as you yourself already tried, semantic predicates can indeed be used to force the lexer to look ahead in the char-stream to ensure there is actually a ".." after an IntegerLiteral token before trying to match a FloatLiteral.

以下语法将产生与第一个演示相同的标记.

The following grammar would produce the same tokens as the first demo.

grammar Dee;

parse
 : (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
 ;

Range
 : '..'
 ;

Number
 : (IntegerLiteral Range)=> IntegerLiteral {$type=IntegerLiteral;}
 | (FloatLiteral)=>         FloatLiteral   {$type=FloatLiteral;}
 |                          IntegerLiteral {$type=IntegerLiteral;}
 ;

// skipping
Space
 : ' ' {skip();}
 ;

// fragments
fragment DecimalExponent : 'e' | 'E' | 'e+' | 'E+' | 'e-' | 'E-' DecimalDigits;
fragment DecimalDigits   : ('0'..'9'|'_')+ ;
fragment FloatLiteral    : Float ImaginarySuffix?;
fragment IntegerLiteral  : Integer IntSuffix?;
fragment FloatTypeSuffix : 'f' | 'F' | 'L';
fragment ImaginarySuffix : 'i';
fragment IntSuffix       : 'L'|'u'|'U'|'Lu'|'LU'|'uL'|'UL' ;
fragment Integer         : Decimal| Binary| Octal| Hexadecimal ;
fragment Decimal         : '0' | '1'..'9' (DecimalDigit | '_')* ;
fragment Binary          : ('0b' | '0B') ('0' | '1' | '_')+ ;
fragment Octal           : '0' (OctalDigit | '_')+ ;
fragment Hexadecimal     : ('0x' | '0X') (HexDigit | '_')+;
fragment DecimalDigit    : '0'..'9' ;
fragment OctalDigit      : '0'..'7' ;
fragment HexDigit        : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment Float
 : d=DecimalDigits ( options {greedy = true; } : FloatTypeSuffix
                   | '.' DecimalDigits DecimalExponent?
                   )
 | '.' DecimalDigits DecimalExponent?
 ;

这篇关于ANTLR 中的浮点文字和范围参数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

Grammar