本文介绍了使用groupby的列的累积列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下数据框:
Fruit metric
0 Apple NaN
1 Apple 100.0
2 Apple NaN
3 Peach 70.0
4 Pear 120.0
5 Pear 100.0
6 Pear NaN
我的目标是对水果进行分组,然后依次将每个不为空的metric
值添加到具有自己单独列的累积列表中,如下所示:
My objective is to groupby fruit and in order, add each value of metric
that is not null to a cumulative list with its own separate column like so:
Fruit metric metric_cum
0 Apple NaN []
1 Apple 100.0 [100]
2 Apple NaN [100]
3 Peach 70.0 [70]
4 Pear 120.0 [120]
5 Pear 100.0 [120, 100]
6 Pear NaN [120, 100]
我尝试这样做:
df['metric1'] = df['metric'].astype(str)
df.groupby('Fruit')['metric1'].cumsum()
但这会导致DataError: No numeric types to aggregate
.
我也尝试过这样做:
df.groupby('Fruit')['metric'].apply(list)
结果:
Fruit
Apple [nan, 100.0, nan]
Peach [70.0]
Pear [120.0, 100.0, nan]
Name: metric, dtype: object
但这不是累积性的,因此无法列成一列.谢谢您的帮助
But this is not cumulative and isn't able to made into a column.Thanks for your help
推荐答案
使用:
df['metric'] = df['metric'].apply(lambda x: [] if pd.isnull(x) else [int(x)])
df['metric_cum'] = df.groupby('Fruit')['metric'].apply(lambda x: x.cumsum())
print (df)
Fruit metric metric_cum
0 Apple [] []
1 Apple [100] [100]
2 Apple [] [100]
3 Peach [70] [70]
4 Pear [120] [120]
5 Pear [100] [120, 100]
6 Pear [] [120, 100]
或者:
a = df['metric'].apply(lambda x: [] if pd.isnull(x) else [int(x)])
df['metric_cum'] = a.groupby(df['Fruit']).apply(lambda x: x.cumsum())
print (df)
Fruit metric metric_cum
0 Apple NaN []
1 Apple 100.0 [100]
2 Apple NaN [100]
3 Peach 70.0 [70]
4 Pear 120.0 [120]
5 Pear 100.0 [120, 100]
6 Pear NaN [120, 100]
这篇关于使用groupby的列的累积列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!