ANTLR解决非LL(*)问题和语法谓词

本文介绍了ANTLR解决非LL(*)问题和语法谓词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑解析器中的以下规则:

consider following rules in the parser:

expression
    :   IDENTIFIER
    |   (...)
    |   procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

procedure_call
    :   '(' expression expression* ')'
    ;

macro_use
    :   '(' IDENTIFIER datum* ')'
    ;

和

// Note that any string that parses as an <expression> will also parse as a <datum>.
datum
    : simple_datum
    | compound_datum
    ;

simple_datum
    :   BOOLEAN
    |   NUMBER
    |   CHARACTER
    |   STRING
    |   IDENTIFIER
    ;

compound_datum
    : list
    | vector
    ;

list
    :   '(' (datum+ ( '.' datum)?)? ')'
    |   ABBREV_PREFIX datum
    ;

fragment ABBREV_PREFIX
    :   ('\''   | '`' | ',' | ',@')
    ;

vector
    : '#(' datum* ')'
    ;

表达式规则中的procedure_call和macro_rule替代方案生成非LL(*)结构错误.我可以看到问题了，因为(IDENTIFIER)将同时解析为两者.但是即使我用+而不是*都定义了，即使上面的示例不再需要解析，它也会产生错误.
我想出了句法谓词的用法，但是我不知道如何在这里使用它们来完成技巧.
像

the procedure_call and macro_rule alternative in the expression rule generate an non-LL(*) structure error. I can see the problem, since (IDENTIFIER) will parse as both. but even when i define both with + instead of *, it generates the error, even though above example shouldn't be parsing anymore.
i came up with the usage of syntactic predicates, but i can't figure out how to use them to do the trick here.
something like

expression
    :   IDENTIFIER
    |   (...)
    |   (procedure_call)=>procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

或

expression
    :   IDENTIFIER
    |   (...)
    |   ('(' IDENTIFIER expression)=>procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

也不起作用，因为除第一个规则外，任何规则都不匹配.有解决这个问题的适当方法吗?

doesnt work either, since none but the first rule will match anything. is there a proper way to solve that?

推荐答案

我发现了 R5RS的JavaCC语法我曾经(很快！)写过ANTLR等效文件:

I found a JavaCC grammar of R5RS which I used to (quickly!) write an ANTLR equivalent:

/*
 * Copyright (C) 2011 by Bart Kiers, based on the work done by Håkan L. Younes'
 * JavaCC R5RS grammar, available at: http://mindprod.com/javacc/R5RS.jj
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */
grammar R5RS;

parse
  :  commandOrDefinition* EOF
  ;

commandOrDefinition
  :  (syntaxDefinition)=>              syntaxDefinition
  |  (definition)=>                    definition
  |  ('(' BEGIN commandOrDefinition)=> '(' BEGIN commandOrDefinition+ ')'
  |                                    command
  ;

syntaxDefinition
  :  '(' DEFINE_SYNTAX keyword transformerSpec ')'
  ;

definition
  :  '(' DEFINE ( variable expression ')'
                | '(' variable defFormals ')' body ')'
                )
  |  '(' BEGIN definition* ')'
  ;

defFormals
  :  variable* ('.' variable)?
  ;

keyword
  :  identifier
  ;

transformerSpec
  :  '(' SYNTAX_RULES '(' identifier* ')' syntaxRule* ')'
  ;

syntaxRule
  :  '(' pattern template ')'
  ;

pattern
  :  patternIdentifier
  |  '(' (pattern+ ('.' pattern | ELLIPSIS)?)?  ')'
  |  '#(' (pattern+ ELLIPSIS? )? ')'
  |  patternDatum
  ;

patternIdentifier
  :  syntacticKeyword
  |  VARIABLE
  ;

patternDatum
  :  STRING
  |  CHARACTER
  |  bool
  |  number
  ;

template
  :  patternIdentifier
  |  '(' (templateElement+ ('.' templateElement)?)? ')'
  |  '#(' templateElement* ')'
  |  templateDatum
  ;

templateElement
  :  template ELLIPSIS?
  ;

templateDatum
  :  patternDatum
  ;

command
  :  expression
  ;

identifier
  :  syntacticKeyword
  |  variable
  ;

syntacticKeyword
  :  expressionKeyword
  |  ELSE
  |  ARROW
  |  DEFINE
  |  UNQUOTE
  |  UNQUOTE_SPLICING
  ;

expressionKeyword
  :  QUOTE
  |  LAMBDA
  |  IF
  |  SET
  |  BEGIN
  |  COND
  |  AND
  |  OR
  |  CASE
  |  LET
  |  LETSTAR
  |  LETREC
  |  DO
  |  DELAY
  |  QUASIQUOTE
  ;

expression
  :  (variable)=>          variable
  |  (literal)=>           literal
  |  (lambdaExpression)=>  lambdaExpression
  |  (conditional)=>       conditional
  |  (assignment)=>        assignment
  |  (derivedExpression)=> derivedExpression
  |  (procedureCall)=>     procedureCall
  |  (macroUse)=>          macroUse
  |                        macroBlock
  ;

variable
  :  VARIABLE
  |  ELLIPSIS
  ;

literal
  :  quotation
  |  selfEvaluating
  ;

quotation
  :  '\'' datum
  |  '(' QUOTE datum ')'
  ;

selfEvaluating
  :  bool
  |  number
  |  CHARACTER
  |  STRING
  ;

lambdaExpression
  :  '(' LAMBDA formals body ')'
  ;

formals
  :  '(' (variable+ ('.' variable)?)? ')'
  |  variable
  ;

conditional
  :  '(' IF test consequent alternate? ')'
  ;

test
  :  expression
  ;

consequent
  :  expression
  ;

alternate
  :  expression
  ;

assignment
  :  '(' SET variable expression ')'
  ;

derivedExpression
  :  quasiquotation
  |  '(' ( COND ( '(' ELSE sequence ')'
                | condClause+ ('(' ELSE sequence ')')?
                )
         | CASE expression ( '(' ELSE sequence ')'
                           | caseClause+ ('(' ELSE sequence ')')?
                           )
         | AND test*
         | OR test*
         | LET variable? '(' bindingSpec* ')' body
         | LETSTAR '(' bindingSpec* ')' body
         | LETREC '(' bindingSpec* ')' body
         | BEGIN sequence
         | DO '(' iterationSpec* ')' '(' test doResult? ')' command*
         | DELAY expression
         )
     ')'
  ;

condClause
  :  '(' test (sequence | ARROW recipient)? ')'
  ;

recipient
  :  expression
  ;

caseClause
  :  '(' '(' datum* ')' sequence ')'
  ;

bindingSpec
  :  '(' variable expression ')'
  ;

iterationSpec
  :  '(' variable init step? ')'
  ;

init
  :  expression
  ;

step
  :  expression
  ;

doResult
  :  sequence
  ;

procedureCall
  :  '(' operator operand* ')'
  ;

operator
  :  expression
  ;

operand
  :  expression
  ;

macroUse
  :  '(' keyword datum* ')'
  ;

macroBlock
  :  '(' (LET_SYNTAX | LETREC_SYNTAX) '(' syntaxSpec* ')' body ')'
  ;

syntaxSpec
  :  '(' keyword transformerSpec ')'
  ;

body
  :  ((definition)=> definition)* sequence
  ;

//sequence
//  :  ((command)=> command)* expression
//  ;

sequence
  :  expression+
  ;

datum
  :  simpleDatum
  |  compoundDatum
  ;

simpleDatum
  :  bool
  |  number
  |  CHARACTER
  |  STRING
  |  identifier
  ;

compoundDatum
  :  list
  |  vector
  ;

list
  :  '(' (datum+ ('.' datum)?)? ')'
  |  abbreviation
  ;

abbreviation
  :  abbrevPrefix datum
  ;

abbrevPrefix
  :  '\'' | '`' | ',@' | ','
  ;

vector
  :  '#(' datum* ')'
  ;

number
  :  NUM_2
  |  NUM_8
  |  NUM_10
  |  NUM_16
  ;

bool
  :  TRUE
  |  FALSE
  ;

quasiquotation
  :  quasiquotationD[1]
  ;

quasiquotationD[int d]
  :  '`' qqTemplate[d]
  |  '(' QUASIQUOTE qqTemplate[d] ')'
  ;

qqTemplate[int d]
  :  (expression)=>  expression
  |  ('(' UNQUOTE)=> unquotation[d]
  |                  simpleDatum
  |                  vectorQQTemplate[d]
  |                  listQQTemplate[d]
  ;

vectorQQTemplate[int d]
  :  '#(' qqTemplateOrSplice[d]* ')'
  ;

listQQTemplate[int d]
  :                     '\'' qqTemplate[d]
  |  ('(' QUASIQUOTE)=> quasiquotationD[d+1]
  |                     '(' (qqTemplateOrSplice[d]+ ('.' qqTemplate[d])?)? ')'
  ;

unquotation[int d]
  :  ',' qqTemplate[d-1]
  |  '(' UNQUOTE qqTemplate[d-1] ')'
  ;

qqTemplateOrSplice[int d]
  :  ('(' UNQUOTE_SPLICING)=> splicingUnquotation[d]
  |                           qqTemplate[d]
  ;

splicingUnquotation[int d]
  :  ',@' qqTemplate[d-1]
  |  '(' UNQUOTE_SPLICING qqTemplate[d-1] ')'
  ;

// macro keywords
LET_SYNTAX       : 'let-syntax';
LETREC_SYNTAX    : 'letrec-syntax';
SYNTAX_RULES     : 'syntax-rules';
DEFINE_SYNTAX    : 'define-syntax';

// syntactic keywords
ELSE             : 'else';
ARROW            : '=>';
DEFINE           : 'define';
UNQUOTE_SPLICING : 'unquote-splicing';
UNQUOTE          : 'unquote';

// expression keywords
QUOTE            : 'quote';
LAMBDA           : 'lambda';
IF               : 'if';
SET              : 'set!';
BEGIN            : 'begin';
COND             : 'cond';
AND              : 'and';
OR               : 'or';
CASE             : 'case';
LET              : 'let';
LETSTAR          : 'let*';
LETREC           : 'letrec';
DO               : 'do';
DELAY            : 'delay';
QUASIQUOTE       : 'quasiquote';

NUM_2  : PREFIX_2 COMPLEX_2;
NUM_8  : PREFIX_8 COMPLEX_8;
NUM_10 : PREFIX_10? COMPLEX_10;
NUM_16 : PREFIX_16 COMPLEX_16;

ELLIPSIS : '...';

VARIABLE
  :  INITIAL SUBSEQUENT*
  |  PECULIAR_IDENTIFIER
  ;

STRING : '"' STRING_ELEMENT* '"';

CHARACTER : '#\\' (~(' ' | '\n') | CHARACTER_NAME);

TRUE  : '#' ('t' | 'T');
FALSE : '#' ('f' | 'F');

// to ignore
SPACE   : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
COMMENT : ';' ~('\r' | '\n')* {$channel=HIDDEN;};

// fragments
fragment INITIAL : LETTER | SPECIAL_INITIAL;
fragment LETTER : 'a'..'z' | 'A'..'Z';
fragment SPECIAL_INITIAL : '!' | '$' | '%' | '&' | '*' | '/' | ':' | '<' | '=' | '>' | '?' | '^' | '_' | '~';
fragment SUBSEQUENT : INITIAL | DIGIT | SPECIAL_SUBSEQUENT;
fragment DIGIT : '0'..'9';
fragment SPECIAL_SUBSEQUENT : '.' | '+' | '-' | '@';
fragment PECULIAR_IDENTIFIER : '+' | '-';
fragment STRING_ELEMENT : ~('"' | '\\') | '\\' ('"' | '\\');
fragment CHARACTER_NAME : 'space' | 'newline';

fragment COMPLEX_2
  :  REAL_2 ('@' REAL_2)?
  |  REAL_2? SIGN UREAL_2? ('i' | 'I')
  ;

fragment COMPLEX_8
  :  REAL_8 ('@' REAL_8)?
  |  REAL_8? SIGN UREAL_8? ('i' | 'I')
  ;

fragment COMPLEX_10
  :  REAL_10 ('@' REAL_10)?
  |  REAL_10? SIGN UREAL_10? ('i' | 'I')
  ;

fragment COMPLEX_16
  :  REAL_16 ('@' REAL_16)?
  |  REAL_16? SIGN UREAL_16? ('i' | 'I')
  ;

fragment REAL_2 : SIGN? UREAL_2;
fragment REAL_8 : SIGN? UREAL_8;
fragment REAL_10 : SIGN? UREAL_10;
fragment REAL_16 : SIGN? UREAL_16;
fragment UREAL_2 : UINTEGER_2 ('/' UINTEGER_2)?;
fragment UREAL_8 : UINTEGER_8 ('/' UINTEGER_8)?;
fragment UREAL_10 : UINTEGER_10 ('/' UINTEGER_10)? | DECIMAL_10;
fragment UREAL_16 : UINTEGER_16 ('/' UINTEGER_16)?;

fragment DECIMAL_10
  :  UINTEGER_10 SUFFIX
  |  '.' DIGIT+ '#'* SUFFIX?
  |  DIGIT+ '.' DIGIT* '#'* SUFFIX?
  |  DIGIT+ '#'+ '.' '#'* SUFFIX?
  ;

fragment UINTEGER_2 : DIGIT_2+ '#'*;
fragment UINTEGER_8 : DIGIT_8+ '#'*;
fragment UINTEGER_10 : DIGIT+ '#'*;
fragment UINTEGER_16 : DIGIT_16+ '#'*;
fragment PREFIX_2 : RADIX_2 EXACTNESS? | EXACTNESS RADIX_2;
fragment PREFIX_8 : RADIX_8 EXACTNESS? | EXACTNESS RADIX_8;
fragment PREFIX_10 : RADIX_10 EXACTNESS? | EXACTNESS RADIX_10;
fragment PREFIX_16 : RADIX_16 EXACTNESS? | EXACTNESS RADIX_16;
fragment SUFFIX : EXPONENT_MARKER SIGN? DIGIT+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l' | 'E' | 'S' | 'F' | 'D' | 'L';
fragment SIGN : '+' | '-';
fragment EXACTNESS : '#' ('i' | 'e' | 'I' | 'E');
fragment RADIX_2 : '#' ('b' | 'B');
fragment RADIX_8 : '#' ('o' | 'O');
fragment RADIX_10 : '#' ('d' | 'D');
fragment RADIX_16 : '#' ('x' | 'X');
fragment DIGIT_2 : '0' | '1';
fragment DIGIT_8 : '0'..'7';
fragment DIGIT_16 : DIGIT | 'a'..'f' | 'A'..'F';

可以通过以下类进行测试:

which can be tested with the following class:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source =
        "(define sum-iter                        \n" +
        "  (lambda(n acc i)                      \n" +
        "    (if (> i n)                         \n" +
        "      acc                               \n" +
        "      (sum-iter n (+ acc i) (+ i 1)))))   ";
    R5RSLexer lexer = new R5RSLexer(new ANTLRStringStream(source));
    R5RSParser parser = new R5RSParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

并生成词法分析器&解析器，编译所有Java源文件并运行主类，执行以下操作:

and to generate a lexer & parser, compile all Java source files and run the main class, do:

bart@hades:~/Programming/ANTLR/Demos/R5RS$ java -cp antlr-3.3.jar org.antlr.Tool R5RS.g
bart@hades:~/Programming/ANTLR/Demos/R5RS$ javac -cp antlr-3.3.jar *.java
bart@hades:~/Programming/ANTLR/Demos/R5RS$ java -cp .:antlr-3.3.jar Main
bart@hades:~/Programming/ANTLR/Demos/R5RS$

控制台上没有任何内容打印的事实意味着解析器(和词法分析器)在提供的源中未发现任何错误.

The fact that nothing is being printed on the console means the parser (and lexer) didn't find any errors with the provided source.

请注意，我没有单元测试，只测试了Main类中的单个Scheme源.如果您在ANTLR语法中发现错误，不胜感激，希望能解决该语法.在适当的时候，我可能会将语法提交给官方的 ANTLR Wiki .

Note that I have no Unit tests and have only tested the single Scheme source inside the Main class. If you find errors in the ANTLR grammar, I'd appreciate to hear about them so I can fix the grammar. In due time, I'll probably commit the grammar to the official ANTLR Wiki.

这篇关于ANTLR解决非LL(*)问题和语法谓词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！