本文介绍了如何为WDI编写野牛语法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在野牛语法构建方面需要一些帮助.

I need some help in bison grammar construction.

我的另一个问题:我正在尝试制作一种用于编写可以直接嵌入C/C ++代码中的标记代码(例如xml和html)的元语言.这是用这种语言编写的一个简单示例,我称之为WDI(Web开发接口):

From my another question:I'm trying to make a meta-language for writing markup code (such as xml and html) which can be directly embedded into C/C++ code.Here is a simple sample written in this language, I call it WDI (Web Development Interface):

 /*
  * Simple wdi/html sample source code
  */
 #include <mySite>

 string name = "myName";
 string toCapital(string str);

 html
 {
  head {
   title { mySiteTitle; }
   link(rel="stylesheet", href="style.css");
  }
  body(id="default") {
   // Page content wrapper
   div(id="wrapper", class="some_class") {
    h1 { "Hello, " + toCapital(name) + "!"; }

    // Lists post
    ul(id="post_list") {
     for(post in posts) {
      li { a(href=post.getID()) { post.tilte; } }
     }
    }
   }
  }
 }

基本上,它是C语言,带有用于html的用户友好界面.如您所见,传统的基于标签的样式被C样替换,并用花括号分隔块.我需要构建一个解释器以将此代码转换为html,然后将其插入到C中,以便可以对其进行编译. C部分保持不变.在wdi源代码中,不必使用打印,每个return语句都将用于输出(在printf函数中).该程序的输出将是纯净的html代码.

Basically it is a C source with a user-friendly interface for html.As you can see the traditional tag-based style is substituted by C-like, with blocks delimited by curly braces.I need to build an interpreter to translate this code to html and posteriorly insert it into C, so that it can be compiled. The C part stays intact.Inside the wdi source it is not necessary to use prints, every return statement will be used for output (in printf function).The program's output will be clean html code.

因此,例如,标题1标签将像这样转换:

So, for example a heading 1 tag would be transformed like this:

h1 { "Hello, " + toCapital(name) + "!"; }
// would become:
printf("<h1>Hello, %s!</h1>", toCapital(name));

我的主要目标是创建一个解释器,将wdi源代码转换为html,如下所示:

My main goal is to create an interpreter to translate wdi source to html like this:

tag(attributes) {content} => <tag attributes>content</tag>

第二,解释器返回的html代码必须通过printfs插入C代码中.为了将它们用作printf参数(在示例源中toCapital(name)的情况下),还应该对wdi内部出现的变量和函数进行排序.

Secondly, html code returned by the interpreter has to be inserted into C code with printfs. Variables and functions that occur inside wdi should also be sorted in order to use them as printf parameters (the case of toCapital(name) in sample source).

这是我的flex/野牛文件:

Here are my flex/bison files:

id        [a-zA-Z_]([a-zA-Z0-9_])*
number    [0-9]+
string    \".*\"

%%

{id} {
        yylval.string = strdup(yytext);
        return(ID);
    }

{number} {
        yylval.number = atoi(yytext);
        return(NUMBER);
    }

{string} {
        yylval.string = strdup(yytext);
        return(STRING);
    }

"(" { return(LPAREN); }
")" { return(RPAREN); }
"{" { return(LBRACE); }
"}" { return(RBRACE); }
"=" { return(ASSIGN); }
"," { return(COMMA);  }
";" { return(SEMICOLON); }

\n|\r|\f { /* ignore EOL */ }
[ \t]+   { /* ignore whitespace */ }
.        { /* return(CCODE); Find C source */ }

%%


%start wdi
%token LPAREN RPAREN LBRACE RBRACE ASSIGN COMMA SEMICOLON CCODE QUOTE

%union
{
    int number;
    char *string;
}

%token <string> ID STRING
%token <number> NUMBER

%%
wdi
    : /* empty */
    | blocks
    ;

blocks
    : block
    | blocks block
    ;

block
    : head SEMICOLON
    | head body
    ;

head
    : ID
    | ID
    attributes
    ;

attributes
    : LPAREN RPAREN
    | LPAREN attribute_list RPAREN
    ;

attribute_list
    : attribute
    | attribute COMMA attribute_list
    ;

attribute
    : key ASSIGN value
    ;

key
    : ID {$$=$1}
    ;

value
    : STRING {$$=$1}
    /*| NUMBER*/
    /*| CCODE*/
    ;

body
    : LBRACE content RBRACE
    ;

content
    : /* */
    | blocks
    | STRING SEMICOLON
    | NUMBER SEMICOLON
    | CCODE
    ;

%%

我很难为语言定义适当的语法,尤其是在拆分WDI和C代码方面.我刚刚开始学习语言处理技术,所以我需要一些指导.有人可以纠正我的代码或提供一些示例来说明解决此问题的正确方法吗?

I am having difficulties on defining a proper grammar for the language, specially in splitting WDI and C code . I just started learning language processing techniques so I need some orientation.Could someone correct my code or give some examples of what is the right way to solve this problem?

推荐答案

如果您打算解析 C代码和嵌入式WDI代码,那么您将不胜枚举.众所周知,LALR(1)解析器生成器(包括Bison)在解析C方面很差,更不用说比C复杂的了(意味着C + WDI).

If your intention is to parse C code and embedded WDI code, you're in for a hard ride. LALR(1) parser generators (including Bison) are notoriously bad at parsing C, let alone things more complicated than C (meaning C + WDI).

您将必须:

a)学习如何通过纠结解析和符号表构造来使Bison解析C(意味着,与GNU GCC进行斗争,看看他们是如何做到的),

a) learn how to make Bison parse C by tangling parsing and symbol table construction (meaning, go struggle with GNU GCC to see how they did it ),

b)切换到更强大的解析器生成器,例如GLR解析器生成器(Bison可以选择),并学习如何处理模棱两可的语法以及如何解决它们,

b) Switch to a stronger parser generator such as a GLR parser generator (which Bison has an option for) and learn how to deal with ambiguous grammars and how to resolve them,

c)将WDI设计为一种孤岛式语法,其目的是挑选WDI代码,并将非WDI的所有内容保留为不透明字符串(在您的情况下,注定要作为假定的C​​代码输出).后一种方法要容易得多,并且几乎可以完成所有网页语言(ASP,PHP,JSP ...)的工作.这样做的好处是容易得多,您只需要编写WDI本身的语法和一个词法分析器即可将不是WDI的所有内容作为任意字符串.不利的一面是,您将无法使WDI和C很好地交互/和/或使用解析器检查WDI程序的有效性.有关更多背景信息,请参见此问题:

c) design WDI as a kind of island grammer, in which the goal is to pick out the WDI code and leave everything that is not WDI as opaque strings (in your case destined to be output as presumed C code). This latter approach is much easier, and is roughly what all the web page languages (ASP, PHP, JSP ...) do. The upside is that this is much easier, and you only have to write the grammar for WDI itself and a lexer that will pick up everything that is not WDI as an abitrary string. The down side is that you wont' be able to make WDI and C interact nicely/and/or check the validity of a WDI program with your parser.See this SO question for some more background:

岛语法antlr3

如果您在开始此项目之前更详细地了解编译器技术,这会更容易.

This would be easier if you go learn about compiler technology in more detail before you started this project.

这篇关于如何为WDI编写野牛语法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 03:29