我是 python 的新手,我试图弄清楚如何加载一个包含每个时间步长数据块的数据文件,例如:
TIME:,0
Q01 : A:,-10.7436,0.000536907,-0.00963283,0.00102934
Q02 : B:,0,0.0168694,-0.000413983,0.00345921
Q03 : C:,0.0566665
Q04 : D:,0.074456
Q05 : E:,0.077456
Q06 : F:,0.0744835
Q07 : G:,0.140448
Q08 : H:,-0.123968
Q09 : I:,0
Q10 : J:,0.00204377,0.0109621,-0.0539183,0.000708574
Q11 : K:,-2.86115e-17,0.00947104,0.0145645,1.05458e-16,-1.90972e-17,-0.00947859
Q12 : L:,-0.0036781,0.00161254
Q13 : M:,-0.00941257,0.000249692,-0.0046302,-0.00162387,0.000981709,-0.0135982,-0.0223496,-0.00872062,0.00548815,0.0114075,.........,-0.00196206
Q14 : N:,3797, 66558
Q15 : O:,0.0579981
Q16 : P:,0
Q17 : Q:,625
TIME:,0.1
Q01 : A:,-10.563,0.000636907,-0.00963283,0.00102934
Q02 : B:,0,0.01665694
Q03 : C:,0.786,-0.000666,0.6555
Q04 : D:,0.87,0.96
Q05 : E:,0.077456
Q06 : F:,0.07447835
Q07 : G:,0.140448
Q08 : H:,-0.123968
Q09 : I:,0
Q10 : J:,0.00204377,0.0109621,-0.0539183,0.000708574
Q11 : K:,-2.86115e-17,0.00947104,0.0145645,1.05458e-16,-1.90972e-17,-0.00947859
Q12 : L:,-0.0036781,0.00161254
Q13 : M:,-0.00941257,0.000249692,-0.0046302,-0.00162387,0.000981709,-0.0135982,-0.0223496,-0.00872062,0.00548815,0.0114075,.........,-0.00196206
Q14 : N:,3797, 66558
Q15 : O:,0.0579981
Q16 : P:,0,2,4
Q17 : Q:,786
每个块都包含许多变量,其中的数据列数可能非常不同。每个时间步中每个变量的列数可能会发生变化,但每个时间步中每个块的变量数是相同的,并且始终知道导出了多少变量。数据文件中没有关于数据块数(时间步长)的信息。
读取数据后,应以每个时间步长的变量格式加载它:
Time: | A: | B:
0 | -10.7436,0.000536907,-0.00963283,0.00102934 | ........
0.1 | -10.563,0.000636907,-0.00963283,0.00102934 | ........
0.2 | ...... | ........
如果每个时间步的数据列数和每个变量的列数都相同,这将是一个非常简单的问题。
我想我需要一行一行地读取文件,在两个循环中,每个块一个,然后在每个块内一次,然后将输入存储在一个数组中(追加?)。由于我对 python 和 numpy 还不是很熟悉,每行列数的变化让我一时感到有些困惑。
如果有人能指出我正确的方向,例如我应该使用哪些功能来相对有效地执行此操作,那就太好了。
最佳答案
import pandas as pd
res = {}
TIME = None
# by default lazy line read
for line in open('file.txt'):
parts = line.strip().split(':')
map(str.strip, parts)
if len(parts) and parts[0] == 'TIME':
TIME = parts[1].strip(',')
res[TIME] = {}
print('New time section start {}'.format(TIME))
# here you can stop and work with data from previou period
continue
if len(parts) <= 1:
continue
res[TIME][parts[1].lstrip()] = parts[2].strip(',').split(',')
df = pd.DataFrame.from_dict(res, 'columns')
# for example for TIME 0
dfZero = df['0']
print(dfZero)
df = pd.DataFrame.from_dict(res, 'index')
dfA = df['A']
print(dfA)
关于Python 导入文本文件,其中每行具有不同的列数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37354745/