python - pyparsing，开始和结束字符串相同

相关于：Python parsing bracketed blocks

我有一个具有以下格式的文件：

#
here
are
some
strings
#
and
some
others
 #
 with
 different
 levels
 #
 of
  #
  indentation
  #
 #
#

因此，一个块由开始的#和结尾的#定义。但是，第n-1个块的结尾#也是第n个块的起始#。

我正在尝试编写一个函数，给定这种格式，它将检索每个块的内容，并且这也可能是递归的。

首先，我从正则表达式开始，但是我很快放弃了（我想你猜对了），所以我尝试使用pyparsing，但是我不能简单地写

print(nestedExpr('#','#').parseString(my_string).asList())

因为它引发ValueError异常（ValueError: opening and closing strings cannot be the same）。

知道我无法更改输入格式后，我有没有比pyparsing更好的选择了？

我也尝试使用以下答案：https://stackoverflow.com/a/1652856/740316，并用{替换了} / #/#，但它无法解析字符串。

最佳答案

不幸的是（对您而言），您的分组不仅取决于分隔的“＃”字符，而且还取决于缩进级别（否则，['with','different','levels']将与先前的组['and','some','others']处于同一级别）。解析缩进敏感的语法并不是pyparsing的强项-可以做到，但是并不令人满意。为此，我们将使用pyparsing帮助程序宏indentedBlock，该宏还要求我们定义一个列表变量，indentedBlock可以将其用于缩进堆栈。

请参阅下面代码中的嵌入式注释，以了解如何对pyparsing和indentedBlock使用一种方法：

from pyparsing import *

test = """\
#
here
are
some
strings
#
and
some
others
 #
 with
 different
 levels
 #
 of
  #
  indentation
  #
 #
#"""

# newlines are significant for line separators, so redefine
# the default whitespace characters for whitespace skipping
ParserElement.setDefaultWhitespaceChars(' ')

NL = LineEnd().suppress()
HASH = '#'
HASH_SEP = Suppress(HASH + Optional(NL))

# a normal line contains a single word
word_line = Word(alphas) + NL


indent_stack = [1]

# word_block is recursive, since word_blocks can contain word_blocks
word_block = Forward()
word_group = Group(OneOrMore(word_line | ungroup(indentedBlock(word_block, indent_stack))) )

# now define a word_block, as a '#'-delimited list of word_groups, with
# leading and trailing '#' characters
word_block <<= (HASH_SEP +
                 delimitedList(word_group, delim=HASH_SEP) +
                 HASH_SEP)

# the overall expression is one large word_block
parser = word_block

# parse the test string
parser.parseString(test).pprint()

印刷品：

[['here', 'are', 'some', 'strings'],
 ['and',
  'some',
  'others',
  [['with', 'different', 'levels'], ['of', [['indentation']]]]]]

关于python - pyparsing，开始和结束字符串相同，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/29522805/