问题描述
我正在尝试使用 pyparsing 构建一个解析器,该解析器将匹配任意嵌套的一组括号内的所有文本.如果我们考虑这样的字符串:
"[A,[B,C],[D,E,F],G] 随机中间文本 [H,I,J]"
我希望解析器以返回两个匹配项的方式进行匹配:
["[A,[B,C],[D,E,F],G]",[H,我,J]"]我能够使用一连串的 originalTextFor 与 NestedExpr 混合来完成这个有点工作的版本,但是当您的嵌套比 OriginalTextFor 表达式的数量更深时,这会中断.
是否有一种简单的方法可以只匹配由nestedExpr 抓取的最外层表达式,或者修改其逻辑以便第一次配对匹配之后的所有内容都被视为纯文本而不是被解析?
更新:似乎接近我想要完成的一件事是来自nestedExpr的逻辑的修改版本:
def mynest(opener='{',closer='}'):内容 = (empty.copy()+CharsNotIn(opener+closer+ParserElement.DEFAULT_WHITE_CHARS))ret = 转发()ret <<= ( Suppress(opener) + originalTextFor(ZeroOrMore( ret | content )) + Suppress(closer) )返回 ret
这让我完成了大部分工作,尽管那里有一个额外级别的列表包装,我真的不需要,而且我真正想要的是将这些括号包含在字符串中(没有得到通过不抑制它们进入无限递归情况).
parser = mynest("[","]")result = parser.searchString("[A,[B,C],[D,E,F],G] 随机中间文本 [H,I,J]")结果.asList()>>>[['A,[B,C],[D,E,F],G'], ['H,I,J']]
我知道我可以用一个简单的列表解析来去除这些,但如果我能消除第二个冗余级别,那将是理想的.
不知道为什么这不起作用:
sample = "[A,[B,C],[D,E,F],G] 随机中间文本 [H,I,J]"扫描仪 = originalTextFor(nestedExpr('[',']'))对于 scanr.searchString(sample) 中的匹配:打印(匹配 [0])
印刷品:
'[A,[B,C],[D,E,F],G]''[H,I,J]'
当您的嵌套比 OriginalTextFor 表达式的数量更深时,这会中断"是什么情况?
I'm trying to use pyparsing to build a parser that will match on all text within an arbitrarily nested set of brackets. If we consider a string like this:
"[A,[B,C],[D,E,F],G] Random Middle text [H,I,J]"
What I would like is for a parser to match in a way that it returns two matches:
[
"[A,[B,C],[D,E,F],G]",
"[H,I,J]"
]
I was able to accomplish a somewhat-working version of this using a barrage of originalTextFor mashed up with nestedExpr, but this breaks when your nesting is deeper than the number of OriginalTextFor expressions.
Is there a straightforward way to only match on the outermost expression grabbed by nestedExpr, or a way to modify its logic so that everything after the first paired match is treated as plaintext rather than being parsed?
update: One thing that seems to come close to what I want to accomplish is this modified version of the logic from nestedExpr:
def mynest(opener='{', closer='}'):
content = (empty.copy()+CharsNotIn(opener+closer+ParserElement.DEFAULT_WHITE_CHARS))
ret = Forward()
ret <<= ( Suppress(opener) + originalTextFor(ZeroOrMore( ret | content )) + Suppress(closer) )
return ret
This gets me most of the way there, although there's an extra level of list wrapping in there that I really don't need, and what I'd really like is for those brackets to be included in the string (without getting into an infinite recursion situation by not suppressing them).
parser = mynest("[","]")
result = parser.searchString("[A,[B,C],[D,E,F],G] Random Middle text [H,I,J]")
result.asList()
>>> [['A,[B,C],[D,E,F],G'], ['H,I,J']]
I know I could strip these out with a simple list comprehension, but it would be ideal if I could just eliminate that second, redundant level.
Not sure why this wouldn't work:
sample = "[A,[B,C],[D,E,F],G] Random Middle text [H,I,J]"
scanner = originalTextFor(nestedExpr('[',']'))
for match in scanner.searchString(sample):
print(match[0])
prints:
'[A,[B,C],[D,E,F],G]'
'[H,I,J]'
What is the situation where "this breaks when your nesting is deeper than the number of OriginalTextFor expressions"?
这篇关于Pyparsing - 匹配最外面的一组嵌套括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!