我正在尝试解析以下内容:
<delimiter><text><delimiter><text><delimter>
其中
delimiter
可以是重复三次的任何单个文字字符,而text
可以是分隔符旁边的任何可打印字符(text
的第一次和第二次出现不必匹配,可以为空白)。这是我想出的,但是
text
从第一个定界符到字符串的结尾都被消耗了。from pyparsing import Word, printables
delimiter = Word(printables, exact=1)
text = (Word(printables) + ~delimiter)
parser = delimiter + text # + delimiter + text + delimiter
tests = [
('_abc_123_', ['_', 'abc', '_', '123', '_']),
('-abc-123-', ['-', 'abc', '-', '123', '-']),
('___', ['_', '', '_', '', '_']),
]
for test, expected in tests:
print parser.parseString(test), '<=>', expected
脚本输出:
['_', 'abc_123_'] <=> ['_', 'abc', '_', '123', '_']
['-', 'abc-123-'] <=> ['-', 'abc', '-', '123', '-']
['_', '__'] <=> ['_', '', '_', '', '_']
我想我需要使用
Future
,但是我可以避免在分析时从文本令牌中排除定界符的值。 最佳答案
您的直觉是正确的,您需要使用Forward
(而不是Future
)来捕获文本的定义,因为在解析时间之前这不是完全可以理解的。另外,您对Word的使用必须使用excludeChars
参数排除定界符-仅使用Word(printables) + ~delimiter
是不够的。
这是您的代码,标有必要的更改,并希望提供一些有用的注释:
delimiter = Word(printables, exact=1)
text = Forward() #(Word(printables) + ~delimiter)
def setTextExcludingDelimiter(s,l,t):
# define Word as all printable characters, excluding the delimiter character
# the excludeChars argument for Word is how this is done
text_word = Word(printables, excludeChars=t[0]).setName("text")
# use '<<' operator to assign the text_word definition to the
# previously defined text expression
text << text_word
# attach parse action to delimiter, so that once it is matched,
# it will define the correct expression for text
delimiter.setParseAction(setTextExcludingDelimiter)
# make the text expressions Optional with default value of '' to satisfy 3rd test case
parser = delimiter + Optional(text,'') + delimiter + Optional(text,'') + delimiter
关于python - 解析时的引用 token 值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/32080193/