我正在尝试解析以下内容:

<delimiter><text><delimiter><text><delimter>


其中delimiter可以是重复三次的任何单个文字字符,而text可以是分隔符旁边的任何可打印字符(text的第一次和第二次出现不必匹配,可以为空白)。

这是我想出的,但是text从第一个定界符到字符串的结尾都被消耗了。

from pyparsing import Word, printables

delimiter = Word(printables, exact=1)
text = (Word(printables) + ~delimiter)

parser = delimiter + text  # + delimiter + text + delimiter

tests = [
    ('_abc_123_', ['_', 'abc', '_', '123', '_']),
    ('-abc-123-', ['-', 'abc', '-', '123', '-']),
    ('___', ['_', '', '_', '', '_']),
]

for test, expected in tests:
    print parser.parseString(test), '<=>', expected


脚本输出:

['_', 'abc_123_'] <=> ['_', 'abc', '_', '123', '_']
['-', 'abc-123-'] <=> ['-', 'abc', '-', '123', '-']
['_', '__'] <=> ['_', '', '_', '', '_']


我想我需要使用Future,但是我可以避免在分析时从文本令牌中排除定界符的值。

最佳答案

您的直觉是正确的,您需要使用Forward(而不是Future)来捕获文本的定义,因为在解析时间之前这不是完全可以理解的。另外,您对Word的使用必须使用excludeChars参数排除定界符-仅使用Word(printables) + ~delimiter是不够的。

这是您的代码,标有必要的更改,并希望提供一些有用的注释:

delimiter = Word(printables, exact=1)
text = Forward() #(Word(printables) + ~delimiter)
def setTextExcludingDelimiter(s,l,t):
    # define Word as all printable characters, excluding the delimiter character
    # the excludeChars argument for Word is how this is done
    text_word = Word(printables, excludeChars=t[0]).setName("text")
    # use '<<' operator to assign the text_word definition to the
    # previously defined text expression
    text << text_word
# attach parse action to delimiter, so that once it is matched,
# it will define the correct expression for text
delimiter.setParseAction(setTextExcludingDelimiter)

# make the text expressions Optional with default value of '' to satisfy 3rd test case
parser = delimiter + Optional(text,'') + delimiter + Optional(text,'') + delimiter

关于python - 解析时的引用 token 值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/32080193/

10-10 15:10