问题描述
我有一个数据框(带有DateTime索引),其中一些列包含列表,每个列都有6个元素。 In:dframe.head()
出:
AB \
时间戳
2017-05-01 00:32:25 30 [-3512,375,10-10 ,-358,-1296,-4019]
2017-05-01 00:32:55 30 [-3519,372,-1026,-361,-1302,-4020]
2017-05 -01 00:33:25 30 [-3514,371,-1026,-360,-1297,-4018]
2017-05-01 00:33:55 30 [-3517,377,-1030, -363,-1293,-4027]
2017-05-01 00:34:25 30 [-3515,372,-1033,-361,-1299,-4025]
CD
时间戳
2017-05-01 00:32:25 [1104,1643,625,1374,5414,2066] 49.93
2017-05-01 00:32:55 [1106,1643,622 ,1385 ,5441,2074] 49.94
2017-05-01 00:33:25 [1105,1643,623,1373,5445,2074] 49.91
2017-05-01 00:33:55 [1105 ,1646,620,1384,5438,2076] 49.91
2017-05-01 00:34:25 [1104,1645,613,1374,5431,2082] 49.94
我有一个字典 dict_of_dfs
我想用6个数据框填充
dict_of_dfs = {1:df1,2:df2,3:df3,4:df4,5:df5,6:df6}
其中 ith 数据框包含每个 列表,所以dict中的第一个数据框将是:
In:df1
/ pre>
Out:
ABCD
时间戳
2017-05-01 00:32:25 30 -3512 1104 49.93
2017-05-01 00:32:55 30 -3519 1106 49.94
2017-05 -01 00:33:25 30 -3514 1105 49.91
2017-05-01 00:33:55 30 -3517 1105 49.91
2017-05-01 00:34:25 30 -3515 1104 49.94
等等。
实际的数据框具有比这更多的列和数千行。
什么是最简单,最Python的方式进行转换?解决方案您可以使用dict理解与和
列表的选择值
使用str [0]
,str [1 ]
:N = 6
dfs = {i:df.assign(B = df ['B']对于范围(1,N + 1)中的i,str [i-1],C = df ['C']。str [i-1])$ b
$ b打印(dfs [1])$ b $ b时间戳ABCD
0 2017-05-01 00:32:25 30 -3512 1104 49.93
1 2017-05-01 00:32:55 30 -3519 1106 49.94
2 2017-05-01 00:33:25 30 -3514 1105 49.91
3 2017-05-01 00:33:55 30 -3517 1105 49.91
4 2017 -05-01 00:34:25 30 -3515 1104 49.94
另一个解决方案:
dfs = {i:df.apply(lambda x:x.str [i-1] if type(x.iat [0]) == list else x )for i in range(1,7)}
print(dfs [1])$ b $ b时间戳ABCD
0 2017-05-01 00:32:25 30 -3512 1104 49.93
1 2017-05-01 00:32:55 30 -3519 1106 49.94
2 2017-05-01 00:33:25 30 -3514 1105 49.91
3 2017-05 -01 00:33:55 30 -3517 1105 49.91
4 2017-05-01 00:34:25 30 -3515 1104 49.94
计时:
df = pd.concat [df] * 10000).reset_index(drop = True)
在[185]中:%timeit {i:df.assign(B = df ['B']。str [i-1] ,C = df ['C']。str [i-1])for i in range(1,N + 1)}
1循环,最好3:420 ms每循环
在[186]中:%timeit {i:df.apply(lambda x:x.str [i-1] if type(x.iat [0])== list else x)for i in range(1, 7)}
1循环,最好3:447 ms每循环
在[187]:%timeit {(i + 1):df.applymap(lambda x:x [i ] if(x)== list else x)for i in range(6)}
1循环,最好3:881 ms每循环
I have a dataframe (with a DateTime index) , in which some of the columns contain lists, each with 6 elements.
In: dframe.head() Out: A B \ timestamp 2017-05-01 00:32:25 30 [-3512, 375, -1025, -358, -1296, -4019] 2017-05-01 00:32:55 30 [-3519, 372, -1026, -361, -1302, -4020] 2017-05-01 00:33:25 30 [-3514, 371, -1026, -360, -1297, -4018] 2017-05-01 00:33:55 30 [-3517, 377, -1030, -363, -1293, -4027] 2017-05-01 00:34:25 30 [-3515, 372, -1033, -361, -1299, -4025] C D timestamp 2017-05-01 00:32:25 [1104, 1643, 625, 1374, 5414, 2066] 49.93 2017-05-01 00:32:55 [1106, 1643, 622, 1385, 5441, 2074] 49.94 2017-05-01 00:33:25 [1105, 1643, 623, 1373, 5445, 2074] 49.91 2017-05-01 00:33:55 [1105, 1646, 620, 1384, 5438, 2076] 49.91 2017-05-01 00:34:25 [1104, 1645, 613, 1374, 5431, 2082] 49.94
I have a dictionary
dict_of_dfs
which I want to populate with 6 dataframes,dict_of_dfs = {1: df1, 2:df2, 3:df3, 4:df4, 5:df5, 6:df6}
where the ith dataframe contains the ith items from each list, so the first dataframe in the dict will be:
In:df1 Out: A B C D timestamp 2017-05-01 00:32:25 30 -3512 1104 49.93 2017-05-01 00:32:55 30 -3519 1106 49.94 2017-05-01 00:33:25 30 -3514 1105 49.91 2017-05-01 00:33:55 30 -3517 1105 49.91 2017-05-01 00:34:25 30 -3515 1104 49.94
and so-on.The actual dataframe has more columns than this and thousands of rows.What's the simplest, most python way to make the conversion?
解决方案You can use dict comprehension with
assign
and for select values oflists
usestr[0]
,str[1]
:N = 6 dfs = {i:df.assign(B=df['B'].str[i-1], C=df['C'].str[i-1]) for i in range(1,N + 1)} print(dfs[1]) timestamp A B C D 0 2017-05-01 00:32:25 30 -3512 1104 49.93 1 2017-05-01 00:32:55 30 -3519 1106 49.94 2 2017-05-01 00:33:25 30 -3514 1105 49.91 3 2017-05-01 00:33:55 30 -3517 1105 49.91 4 2017-05-01 00:34:25 30 -3515 1104 49.94
Another solution:
dfs = {i:df.apply(lambda x: x.str[i-1] if type(x.iat[0]) == list else x) for i in range(1,7)} print(dfs[1]) timestamp A B C D 0 2017-05-01 00:32:25 30 -3512 1104 49.93 1 2017-05-01 00:32:55 30 -3519 1106 49.94 2 2017-05-01 00:33:25 30 -3514 1105 49.91 3 2017-05-01 00:33:55 30 -3517 1105 49.91 4 2017-05-01 00:34:25 30 -3515 1104 49.94
Timings:
df = pd.concat([df]*10000).reset_index(drop=True) In [185]: %timeit {i:df.assign(B=df['B'].str[i-1], C=df['C'].str[i-1]) for i in range(1,N+1)} 1 loop, best of 3: 420 ms per loop In [186]: %timeit {i:df.apply(lambda x: x.str[i-1] if type(x.iat[0]) == list else x) for i in range(1,7)} 1 loop, best of 3: 447 ms per loop In [187]: %timeit {(i+1):df.applymap(lambda x: x[i] if type(x) == list else x) for i in range(6)} 1 loop, best of 3: 881 ms per loop
这篇关于将大 pandas 的数据帧转换为数据帧的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!