python - 如何在Python中精确生成DEDENT token ？

我正在阅读有关lexical analysis of python的文档，该文档描述了如何生成INDENT和DEDENT令牌的过程。我在此处发布说明。

  连续行的缩进级别用于使用堆栈生成INDENT和DEDENT令牌，如下所示。

  在读取文件的第一行之前，将一个零压入堆栈。这将不再弹出。压入堆栈的数字将始终严格按照从下到上的顺序递增。在每条逻辑行的开头，将行的缩进级别与堆栈顶部进行比较。如果相等，则什么也不会发生。如果较大，则将其压入堆栈，并生成一个INDENT令牌。如果较小，则它必须是堆栈中出现的数字之一；弹出堆栈上所有较大的数字，并为弹出的每个数字生成一个DEDENT令牌。在文件末尾，将为堆栈上剩余的每个大于零的数字生成一个DEDENT令牌。

我试图理解DEDENT部分，但未能通过，有人能给出比所引用的更好的解释吗？

最佳答案

由于Python有时比英语更容易，因此这里是此描述到Python的粗略翻译。您会看到真实世界的解析器（由我自己编写），其运行方式类似于here。

import re
code = """
for i in range(10):
   if i % 2 == 0:
     print(i)
   print("Next number")
print("That's all")

for i in range(10):
   if i % 2 == 0:
       print(i)
print("That's all again)

for i in range(10):
   if i % 2 == 0:
      print(i)
  print("That's all")
"""
def get_indent(s) -> int:
    m = re.match(r' *', s)
    return len(m.group(0))
def add_token(token):
    print(token)
INDENT="indent"
DEDENT="dedent"
indent_stack = [0]
# Before the first line of the file is read, a single zero is pushed on the stack
for line in code.splitlines():
    print("processing line:", line)
    indent = get_indent(line)
    # At the beginning of each logical line, the line’s
    # indentation level is compared to the top of the stack.
    if indent > indent_stack[-1]:
        # If it is larger, it is pushed on the stack,
        # and one INDENT token is generated.
        add_token(INDENT)
        indent_stack.append(indent)
    elif indent < indent_stack[-1]:
        while indent < indent_stack[-1]:
            #  If it is smaller, ...
            # all numbers on the stack that are larger are popped off,
            # and for each number popped off a DEDENT token is generated.
            add_token(DEDENT)
            indent_stack.pop()
        if indent != indent_stack[-1]:
            # it must be one of the numbers occurring on the stack;
            raise IndentationError
while indent_stack[-1]>0:
     # At the end of the file, a DEDENT token is generated for each number
     # remaining on the stack that is larger than zero.
     add_token(DEDENT)
     indent_stack.pop()

这是输出：

processing line:
processing line: for i in range(10):
processing line:    if i % 2 == 0:
indent
processing line:      print(i)
indent
processing line:    print("Next number")
dedent
processing line: print("That's all")
dedent
processing line:
processing line: for i in range(10):
processing line:    if i % 2 == 0:
indent
processing line:        print(i)
indent
processing line: print("That's all again)
dedent
dedent
processing line:
processing line: for i in range(10):
processing line:    if i % 2 == 0:
indent
processing line:       print(i)
indent
processing line:   print("That's all")
dedent
dedent
  File "<string>", line unknown
IndentationError