问题描述
在ahoullman和sethi的编译器构造中,给出源程序的输入字符串被划分为具有逻辑意义的字符序列,并且被称为令牌和词位是组成
使用 by Aho,Lam,Sethi and Ullman,AKA the ,
Lexeme pg。 111
111
111
图3.2:令牌示例pg.112
[令牌] [非正式描述] [示例词汇]
如果字符i,f如果
否则字符e,l,s,e else
比较<或>或< =或> =或==或!=< =,!=
id字母后跟字母和数字pi,score,D2
number任何数字常数3.14159,0,6.02e23
除了之外的任何文字,包含在核心转储
这个与词法分析器和解析器的关系我们将从解析器开始,并向后输入输入。
为了更容易设计一个解析器,解析器不工作直接输入,但接收由词法分析器生成的令牌列表。看看图3.2中的令牌列,我们看到诸如if,else,comparison,id,number和literal等令牌;这些是令牌的名称。通常,使用词法分析器/解析器,令牌是一种结构,不仅包含令牌的名称,还包含组成令牌的字符/符号以及构成令牌的字符串的开始和结束位置,起始和结束位置用于错误报告,突出显示等。
现在词法分析器接受字符/符号的输入,并使用词法分析器的规则转换输入字符/符号转换为令牌。现在,使用lexer /解析器的人对他们经常使用的东西有自己的话。你认为作为一个字符/符号序列组成一个令牌是什么人谁使用lexer /解析器调用lexeme。所以当你看到lexeme,只是想到一个字符/符号序列代表一个令牌。在比较示例中,字符/符号的序列可以是不同的模式,或者else或3.14等。
另一种认为两者之间关系的方法是令牌是解析器使用的编程结构它有一个名为lexeme的属性,它保存来自输入的字符/符号。现在,如果你看看代码中的大多数token的定义,你可能不会看到lexeme作为令牌的属性之一。这是因为令牌将更可能保持表示令牌和词位的字符/符号的开始和结束位置,可以根据需要从开始和结束位置导出字符/符号序列,因为输入是静态的。 / p>
In compiler construction by aho ullman and sethi , it is given that the input string of characters of the source program are divided into sequence of characters that have a logical meaning , and are known as tokens and lexemes are sequences that make up the token so what is the basic difference ?
Using "Compilers Principles, Techniques, & Tools, 2nd Ed." (WorldCat) by Aho, Lam, Sethi and Ullman, AKA the Purple Book,
Lexeme pg. 111
Token pg. 111
Pattern pg. 111
Figure 3.2: Examplesof tokens pg.112
[Token] [Informal Description] [Sample Lexemes]
if characters i, f if
else characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=
id letter followed by letters and digits pi, score, D2
number any numeric constant 3.14159, 0, 6.02e23
literal anything but ", surrounded by "'s "core dumped"
To better understand this relation to a lexer and parser we will start with the parser and work backwards to the input.
To make it easier to design a parser, a parser does not work with the input directly but takes in a list of tokens generated by a lexer. Looking at the token column in Figure 3.2 we see tokens such as if, else, comparison, id, number and literal; these are names of tokens. Typically with a lexer/parser a token is a structure that holds not only the name of the token, but the characters/symbols that make up the token and the start and end position of the string of characters that make up the token, with the start and end position being used for error reporting, highlighting, etc.
Now the lexer takes the input of characters/symbols and using the rules of the lexer converts the input characters/symbols into tokens. Now people who work with lexer/parser have their own words for things they use often. What you think of as a sequence of characters/symbols that make up a token are what people who use lexer/parsers call lexeme. So when you see lexeme, just think of a sequence of characters/symbols representing a token. In the comparison example, the sequence of characters/symbols can be different patterns such as < or > or "else" or "3.14", etc.
Another way to think of the relation between the two is that a token is a programming structure used by the parser that has a property called lexeme that holds the character/symbols from the input. Now if you look at most definitions of token in code you may not see lexeme as one of the properties of the token. This is because a token will more likely hold the start and end position of the characters/symbols that represent the token and the lexeme, sequence of characters/symbols can be derived from the start and end position as needed because the input is static.
这篇关于token和lexeme之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!