问题描述
AFAIK的lexing Python源代码的技术是:
AFAIK the technique for lexing Python source code is:
- 当前行的缩进级别小于上一行时,产生DEDENT。
- 当输入结束时,如果有未关闭的INDENT,则产生DEDENT。
现在,使用PLY:
- 如何从t_definition ?
- 如何在达到EOF时调用一个t_definition?简单
\Z
无效 - PLY抱怨它匹配空字符串。
- How do I return multiple tokens from a t_definition?
- How do I make a t_definition that's called when EOF is reached? Simple
\Z
doesn't work -- PLY complains that it matches empty string.
推荐答案
据我所知,PLY没有实现push解析器接口,这是你最容易解决这个问题与bison。但是,很容易注入自己的lexer包装器,它可以处理dedent令牌的队列。
As far as I know, PLY does not implement a push parser interface, which is how you would most easily solve this problem with bison. However, it is very easy to inject your own lexer wrapper, which can handle the queue of dedent tokens.
一个最小的lexer实现需要实现一个 token()
方法返回一个类型
和值
属性的对象。 (你也需要如果你的解析器使用它,但我不会在这里担心。)
A minimal lexer implementation needs to implement a token()
method which returns an object with type
and value
attributes. (You also need if your parser uses it, but I'm not going to worry about that here.)
现在,让我们假设底层(PLY生成) lexer产生 NEWLINE
令牌,其值是换行符后面的前导空白的长度。如果一些行不参与INDENT / DEDENT算法,那么应该禁止那些行的 NEWLINE
;我们在这里不考虑这种情况。一个简单的示例lexer函数(它只适用于空格而不是制表符)可能是:
Now, let's suppose that the underlying (PLY-generated) lexer produces NEWLINE
tokens whose value is the length of leading whitespace following the newline. If some lines don't participate in the INDENT/DEDENT algorithm, the NEWLINE
should be suppressed for those lines; we don't consider that case here. An simplistic example lexer function (which only works with spaces, not tabs) might be:
# This function doesn't handle tabs. Beware!
def t_NEWLINE(self, t):
r'\n(?:\s*(?:[#].*)?\n)*\s*'
t.value = len(t.value) - 1 - t.value.rfind('\n')
return t
现在我们用一个处理缩进的包装器包装PLY生成的词法分析器:
Now we wrap the PLY-generated lexer with a wrapper which deals with indents:
# WARNING:
# This code hasn't been tested much and it also may be inefficient
# and/or inexact. It doesn't do python-style tab handling. Etc. etc.
from collections import namedtuple, deque
# These are the tokens. We only generate one of each here. If
# we used lineno or didn't trust the parser to not mess with the
# token, we could generate a new one each time.
IndentToken = namedtuple('Token', 'type value')
dedent = IndentToken('DEDENT', None)
indent = IndentToken('INDENT', None)
newline= IndentToken('NEWLINE', None)
class IndentWrapper(object):
def __init__(self, lexer):
"""Create a new wrapper given the lexer which is being wrapped"""
self.lexer = lexer
self.indent_stack = [0]
# A queue is overkill for this case, but it's simple.
self.token_queue = deque()
# This is just in case the ply-generated lexer cannot be called again
# after it returns None.
self.eof_reached = False
def token(self):
"""Return the next token, or None if end of input has been reached"""
# Do we have any queued tokens?
if self.token_queue:
return self.token_queue.popleft()
# Are we done?
if self.eof_reached:
return None
# Get a token
t = self.lexer.token()
if t is None:
# At end of input, we might need to send some dedents
self.eof_reached = True
if len(self.indent_stack) > 1:
t = dedent
for i in range(len(self.indent_stack) - 1):
self.token_queue.append(dedent)
self.indent_stack = [0]
elif t.type == "NEWLINE":
# The NEWLINE token includes the amount of leading whitespace.
# Fabricate indent or dedents as/if necessary and queue them.
if t.value > self.indent_stack[-1]:
self.indent_stack.append(t.value)
self.token_queue.append(indent)
else:
while t.value < self.indent_stack[-1]:
self.indent_stack.pop()
self.token_queue.append(dedent)
if t.value != self.indent_stack[-1]:
raise IndentError # Or however you indicate errors
return t
这篇关于PLY - 返回多个令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!