问题描述
我想按组滞后数据框中的每一列.我有一个这样的框架:
将 numpy 导入为 np将熊猫导入为 pdindex = pd.date_range('2015-11-20', period=6, freq='D')df = pd.DataFrame(dict(time=index, grp=['A']*3 + ['B']*3, col1=[1,2,3]*2,col2=['a','b','c']*2)).set_index(['time','grp'])
看起来像
col1 col2时间组2015-11-20 甲 1 甲2015-11-21 甲 2 乙2015-11-22 A 3 c2015-11-23 乙 1 一2015-11-24 乙 2 乙2015-11-25 乙 3 丙
我希望它看起来像这样:
col1 col2 col1_lag col2_lag时间组2015-11-20 A 1 a 2 b2015-11-21 A 2 b 3 c2015-11-22 A 3 c NA NA2015-11-23 B 1 a 2 b2015-11-24 B 2 b 3 c2015-11-25 B 3 c NA NA
这个问题管理单个列的结果,但我有任意数量的列,我想滞后所有列.我可以使用 groupby
和 apply
,但是 apply
在每一列上独立运行 shift
函数,它不会似乎喜欢接收一个 [nrow, 2]
形状的数据帧作为回报.是否有像 apply
这样的函数作用于整个组子框架?或者有没有更好的方法来做到这一点?
IIUC,你可以简单地使用level="grp"
然后移位-1:
I want to lag every column in a dataframe, by group. I have a frame like this:
import numpy as np
import pandas as pd
index = pd.date_range('2015-11-20', periods=6, freq='D')
df = pd.DataFrame(dict(time=index, grp=['A']*3 + ['B']*3, col1=[1,2,3]*2,
col2=['a','b','c']*2)).set_index(['time','grp'])
which looks like
col1 col2
time grp
2015-11-20 A 1 a
2015-11-21 A 2 b
2015-11-22 A 3 c
2015-11-23 B 1 a
2015-11-24 B 2 b
2015-11-25 B 3 c
and I want it to look like this:
col1 col2 col1_lag col2_lag
time grp
2015-11-20 A 1 a 2 b
2015-11-21 A 2 b 3 c
2015-11-22 A 3 c NA NA
2015-11-23 B 1 a 2 b
2015-11-24 B 2 b 3 c
2015-11-25 B 3 c NA NA
This question manages the result for a single column, but I have an arbitrary number of columns, and I want to lag all of them. I can use groupby
and apply
, but apply
runs the shift
function over each column independently, and it doesn't seem to like receiving an [nrow, 2]
shaped dataframe in return. Is there perhaps a function like apply
that acts on the whole group sub-frame? Or is there a better way to do this?
IIUC, you can simply use level="grp"
and then shift by -1:
>>> shifted = df.groupby(level="grp").shift(-1)
>>> df.join(shifted.rename(columns=lambda x: x+"_lag"))
col1 col2 col1_lag col2_lag
time grp
2015-11-20 A 1 a 2 b
2015-11-21 A 2 b 3 c
2015-11-22 A 3 c NaN NaN
2015-11-23 B 1 a 2 b
2015-11-24 B 2 b 3 c
2015-11-25 B 3 c NaN NaN
这篇关于Groupby 并滞后数据帧的所有列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!