pandas版本1.5.3中groupby方法,当设置group_keys=True
时,会以groupby的字段为第一级索引,如下述代码中time_id
作为第一级索引,同时保留了原dataframe(df
)中的索引作为第二级索引。
>>> df.groupby(['time_id'], group_keys=True)['wap'].apply(log_return)
time_id
0 0 NaN
1 0.000000
2 0.000000
3 0.000000
4 0.000000
...
26454 5237975 -0.001228
5237976 0.000491
5237977 -0.005031
5237978 0.003219
5237979 0.003264
Name: wap, Length: 5237980, dtype: float64
group_keys
的意思就是是否保留groupby
的feature(如time_id
)作为keys
放入结果中,True
是放,False
是不放。这也印证了帮助里的说明:
group_keys : bool, optional
When calling apply and the by
argument produces a like-indexed
(i.e. :ref:a transform <groupby.transform>
) result, add group keys to
index to identify pieces. By default group keys are not included
when the result’s index (and column) labels match the inputs, and
are included otherwise. This argument has no effect if the result produced
is not like-indexed with respect to the input.
因此,当设置group_keys=False
时,group keys
(time_id
)就不在返回结果中了,如下所示。在设置为False
是可以直接将返回结果,作为原dataframe(df
)的一列,很方便。
>>> df.groupby(['time_id'], group_keys=False)['wap'].apply(log_return)
0 NaN
1 0.000000
2 0.000000
3 0.000000
4 0.000000
...
5237975 -0.001228
5237976 0.000491
5237977 -0.005031
5237978 0.003219
5237979 0.003264
Name: wap, Length: 5237980, dtype: float64
PS:对英文帮助的深入理解,需要结合实际应用。