我有一些具有树状结构的文件。例如:
A
Result
a11
a12
Lolim
a21
a22
Uplim
a31
a32
B
Result
b11
b12
Lolim
b21
b22
我有兴趣解析这些文件以获得如下所示的数据帧:
Name Result Lolim Uplim
A a12 a22 a32
B b12 b22 NA
我的想法是以某种方式将文件拆分为两部分:A 和 B。然后将每个部分拆分为子类别。对于 A 将是 Result、Lolim 和 Uplim,对于 B 将是 Result 和 Lolim。最后每个子类别分为 2 部分。因此,我最终会得到一个嵌套列表,然后我将能够创建一个数据框。但我不知道如何获取这个嵌套列表。
或者有另一种方法吗?你能推荐我有用的模块或功能吗?
最佳答案
import collections
import pandas as pd
with open("data_tree.dat", "r") as data:
dct = collections.OrderedDict()
key = ""
sub_key = ""
for line in data:
if " " not in line: # single space
key = line.strip()
dct[key] = collections.OrderedDict()
elif " " * 4 in line and " " * 6 not in line: # 4 spaces
sub_key = line.strip()
dct[key][sub_key] = ""
elif " " * 6 in line: # 6 spaces
item = line.strip()
dct[key][sub_key] = item # overwrite, last element only
df = pd.DataFrame.from_dict(dct).transpose()
df.columns.names = ["Name"]
df = df[["Result", "Lolim", "Uplim"]] # if column order matters
df = df.fillna("NA") # in case you want NA and not NaN
print(df)
输出:
Name Result Lolim Uplim
A a12 a22 a32
B b12 b22 NA
这假设
data_tree.dat
看起来像 this 并且包含在与包含上述代码的 .py
文件相同的文件夹中。或者作为一个函数:
import collections
import pandas as pd
def dat_to_df(path_to_file):
with open(path_to_file, "r") as data:
dct = collections.OrderedDict()
key = ""
sub_key = ""
for line in data:
if " " not in line:
key = line.strip()
dct[key] = collections.OrderedDict()
elif " " * 4 in line and " " * 6 not in line:
sub_key = line.strip()
dct[key][sub_key] = ""
elif " " * 6 in line:
item = line.strip()
dct[key][sub_key] = item
df = pd.DataFrame.from_dict(dct).transpose()
df.columns.names = ["Name"]
df = df[["Result", "Lolim", "Uplim"]]
return df.fillna("NA")
dataframe = dat_to_df("data_tree.dat")
print(dataframe)
关于python - 如何将树状数据解析为 Python 中的嵌套列表?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42609755/