我有一个会计树,在源中存储有缩进/空格:
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
有固定数量的级别,因此我想通过使用3个字段来展平层次结构(实际数据有6个级别,例如简化的):
L1 L2 L3
Income
Income Revenue
Income Revenue IAP
Income Revenue Ads
Income Other-Income
Expenses Developers In-house
... etc
我可以通过检查帐户名之前的空格数来做到这一点:
for rownum in range(6,ws.max_row+1):
accountName = str(ws.cell(row=rownum,column=1).value)
indent = len(accountName) - len(accountName.lstrip(' '))
if indent == 0:
l1 = accountName
l2 = ''
l3 = ''
elif indent == 3:
l2 = accountName
l3 = ''
else:
l3 = accountName
w.writerow([l1,l2,l3])
有没有一种更灵活的方法可以根据当前行的缩进量来实现这一目的,而不是假设每级总是3个空格?
L1
将始终没有缩进,并且我们可以相信,较低级别的缩进程度将比其父级缩进更多,但可能并非总是每个级别缩进3个空格。更新,最终将其作为逻辑的要点,因为我最终想要包含内容的帐户列表,所以仅使用缩进来确定是重置,追加还是弹出列表似乎是最简单的:
if indent == 0:
accountList = []
accountList.append((indent,accountName))
elif indent > prev_indent:
accountList.append((indent,accountName))
elif indent <= prev_indent:
max_indent = int(max(accountList,key=itemgetter(0))[0])
while max_indent >= indent:
accountList.pop()
max_indent = int(max(accountList,key=itemgetter(0))[0])
accountList.append((indent,accountName))
因此,在输出的每一行中,accountList都是完整的。
最佳答案
您可以模仿Python实际解析缩进的方式。
首先,创建一个包含缩进级别的堆栈。
在每一行:
如果在找到完全相同的压痕级别之前发现了较低的压痕级别,则存在压痕错误。
indentation = []
indentation.append(0)
depth = 0
f = open("test.txt", 'r')
for line in f:
line = line[:-1]
content = line.strip()
indent = len(line) - len(content)
if indent > indentation[-1]:
depth += 1
indentation.append(indent)
elif indent < indentation[-1]:
while indent < indentation[-1]:
depth -= 1
indentation.pop()
if indent != indentation[-1]:
raise RuntimeError("Bad formatting")
print(f"{content} (depth: {depth})")
使用“test.txt”文件,其内容与您提供的内容相同:
Income
Revenue
IAP
Ads
Other-Income
Expenses
Developers
In-house
Contractors
Advertising
Other Expenses
这是输出:
Income (depth: 0)
Revenue (depth: 1)
IAP (depth: 2)
Ads (depth: 2)
Other-Income (depth: 1)
Expenses (depth: 0)
Developers (depth: 1)
In-house (depth: 2)
Contractors (depth: 2)
Advertising (depth: 1)
Other Expense (depth: 1)
那么,您该怎么办?
假设您要构建嵌套列表。
首先,创建一个数据堆栈。
而且无论如何,对于每一行,都将内容附加到数据堆栈顶部的列表中。
这是相应的实现:
for line in f:
line = line[:-1]
content = line.strip()
indent = len(line) - len(content)
if indent > indentation[-1]:
depth += 1
indentation.append(indent)
data.append([])
elif indent < indentation[-1]:
while indent < indentation[-1]:
depth -= 1
indentation.pop()
top = data.pop()
data[-1].append(top)
if indent != indentation[-1]:
raise RuntimeError("Bad formatting")
data[-1].append(content)
while len(data) > 1:
top = data.pop()
data[-1].append(top)
嵌套列表位于
data
堆栈的顶部。同一文件的输出为:
['Income',
['Revenue',
['IAP',
'Ads'
],
'Other-Income'
],
'Expenses',
['Developers',
['In-house',
'Contractors'
],
'Advertising',
'Other Expense'
]
]
尽管嵌套很深,但它很容易操作。
您可以通过链接项目访问来访问数据:
>>> l = data[0]
>>> l
['Income', ['Revenue', ['IAP', 'Ads'], 'Other-Income'], 'Expenses', ['Developers', ['In-house', 'Contractors'], 'Advertising', 'Other Expense']]
>>> l[1]
['Revenue', ['IAP', 'Ads'], 'Other-Income']
>>> l[1][1]
['IAP', 'Ads']
>>> l[1][1][0]
'IAP'