python - 解析时的引用 token 值

我正在尝试解析以下内容：

<delimiter><text><delimiter><text><delimter>

其中delimiter可以是重复三次的任何单个文字字符，而text可以是分隔符旁边的任何可打印字符（text的第一次和第二次出现不必匹配，可以为空白）。

这是我想出的，但是text从第一个定界符到字符串的结尾都被消耗了。

from pyparsing import Word, printables

delimiter = Word(printables, exact=1)
text = (Word(printables) + ~delimiter)

parser = delimiter + text  # + delimiter + text + delimiter

tests = [
    ('_abc_123_', ['_', 'abc', '_', '123', '_']),
    ('-abc-123-', ['-', 'abc', '-', '123', '-']),
    ('___', ['_', '', '_', '', '_']),
]

for test, expected in tests:
    print parser.parseString(test), '<=>', expected

脚本输出：

['_', 'abc_123_'] <=> ['_', 'abc', '_', '123', '_']
['-', 'abc-123-'] <=> ['-', 'abc', '-', '123', '-']
['_', '__'] <=> ['_', '', '_', '', '_']

我想我需要使用Future，但是我可以避免在分析时从文本令牌中排除定界符的值。

最佳答案

您的直觉是正确的，您需要使用Forward（而不是Future）来捕获文本的定义，因为在解析时间之前这不是完全可以理解的。另外，您对Word的使用必须使用excludeChars参数排除定界符-仅使用Word(printables) + ~delimiter是不够的。

这是您的代码，标有必要的更改，并希望提供一些有用的注释：

delimiter = Word(printables, exact=1)
text = Forward() #(Word(printables) + ~delimiter)
def setTextExcludingDelimiter(s,l,t):
    # define Word as all printable characters, excluding the delimiter character
    # the excludeChars argument for Word is how this is done
    text_word = Word(printables, excludeChars=t[0]).setName("text")
    # use '<<' operator to assign the text_word definition to the
    # previously defined text expression
    text << text_word
# attach parse action to delimiter, so that once it is matched,
# it will define the correct expression for text
delimiter.setParseAction(setTextExcludingDelimiter)

# make the text expressions Optional with default value of '' to satisfy 3rd test case
parser = delimiter + Optional(text,'') + delimiter + Optional(text,'') + delimiter

关于python - 解析时的引用 token 值，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/32080193/