我正在使用下面的代码来尝试按列值分组,然后仅对该组的有序数据运行累积和和移动平均值:
#this works OK
df['csum'] = df.sort_values(['name','day_time','delta_minutes'],ascending=True).groupby(['name']) ['value'].cumsum()
#throws error
df['rolling'] = df.sort_values(['name','day_time','delta_minutes'],ascending=True).groupby(['name'])['value'].rolling(window=2).mean()
原始数据框:
name value delta_minutes day_time
0 MAC000039 0.069 0 2012-10-13
1 MAC000039 0.054 30 2012-10-13
2 MAC000039 0.085 60 2012-10-13
3 MAC000040 0.082 0 2012-10-12
4 MAC000040 0.053 30 2012-10-12
5 MAC000040 0.075 60 2012-10-12
6 MAC000040 0.195 90 2012-10-12
7 MAC000039 0.098 0 2012-10-12
8 MAC000039 0.055 30 2012-10-12
9 MAC000039 0.054 60 2012-10-12
10 MAC000039 0.099 90 2012-10-12
预期产量
name value delta_minutes day_time csum rolling
0 MAC000039 0.069 0 2012-10-13 0.375 ..
1 MAC000039 0.054 30 2012-10-13 0.429 ..
2 MAC000039 0.085 60 2012-10-13 0.514 ..
尝试滚动平均值时出现以下错误
TypeError: incompatible index of inserted column with frame index
任何想法如何做到这一点?
最佳答案
发生错误的原因是df.sort_values(['name','day_time','delta_minutes'],ascending=True).groupby(['name'])['value'].rolling(window=2).mean()
具有MultiIndex
。要修复此问题,请重置'name'
索引并将其删除。
import pandas as pd
"""
name value delta_minutes day_time
0 MAC000039 0.069 0 2012-10-13
1 MAC000039 0.054 30 2012-10-13
2 MAC000039 0.085 60 2012-10-13
3 MAC000040 0.082 0 2012-10-12
4 MAC000040 0.053 30 2012-10-12
5 MAC000040 0.075 60 2012-10-12
6 MAC000040 0.195 90 2012-10-12
7 MAC000039 0.098 0 2012-10-12
8 MAC000039 0.055 30 2012-10-12
9 MAC000039 0.054 60 2012-10-12
10 MAC000039 0.099 90 2012-10-12
"""
df = pd.read_clipboard()
# sorting before adding columns
df.sort_values([
'name',
'day_time',
'delta_minutes'
], inplace = True)
# cumulative sum grouped on name
df['csum'] = df.groupby('name').value.cumsum()
# reset index `name` and drop it
df['rolling'] = df.groupby('name').value.rolling(2).mean().reset_index(level = 'name', drop = True)
print(df)
name value delta_minutes day_time csum rolling
7 MAC000039 0.098 0 2012-10-12 0.098 NaN
8 MAC000039 0.055 30 2012-10-12 0.153 0.0765
9 MAC000039 0.054 60 2012-10-12 0.207 0.0545
10 MAC000039 0.099 90 2012-10-12 0.306 0.0765
0 MAC000039 0.069 0 2012-10-13 0.375 0.0840
1 MAC000039 0.054 30 2012-10-13 0.429 0.0615
2 MAC000039 0.085 60 2012-10-13 0.514 0.0695
3 MAC000040 0.082 0 2012-10-12 0.082 NaN
4 MAC000040 0.053 30 2012-10-12 0.135 0.0675
5 MAC000040 0.075 60 2012-10-12 0.210 0.0640
6 MAC000040 0.195 90 2012-10-12 0.405 0.1350
关于python - Pandas groupby然后滚动平均值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52801540/