以下是我正在使用的示例数据集:
maint id
datetime
2015-01-01 1.0 a
2015-01-02 NaN a
2015-01-03 NaN a
2015-01-04 1.0 a
2015-01-05 NaN a
2015-01-06 NaN a
2015-01-07 NaN a
2015-01-01 NaN b
2015-01-02 NaN b
2015-01-03 1.0 b
2015-01-04 1.0 b
2015-01-05 NaN b
2015-01-06 NaN b
2015-01-07 NaN b
我想得到的是天差,因为
df['maint']
为1。 maint id days
datetime
2015-01-01 1.0 a 0
2015-01-02 NaN a 1
2015-01-03 NaN a 2
2015-01-04 1.0 a 0
2015-01-05 NaN a 1
2015-01-06 NaN a 2
2015-01-07 NaN a 3
2015-01-01 NaN b 0
2015-01-02 NaN b 0
2015-01-03 1.0 b 0
2015-01-04 1.0 b 0
2015-01-05 NaN b 1
2015-01-06 NaN b 2
2015-01-07 NaN b 3
因为我有数千个不同的ID,并且每个ID都有几年的维护记录。我想找到一种计算日差的有效方法。
最佳答案
使用:
df['days'] = df.index.where(df['maint'].eq(1))
df['days'] = (df.index - df.groupby('id')['days'].ffill()).fillna(pd.Timedelta(0)).dt.days
print (df)
maint id days
datetime
2015-01-01 1.0 a 0
2015-01-02 NaN a 1
2015-01-03 NaN a 2
2015-01-04 1.0 a 0
2015-01-05 NaN a 1
2015-01-06 NaN a 2
2015-01-07 NaN a 3
2015-01-01 NaN b 0
2015-01-02 NaN b 0
2015-01-03 1.0 b 0
2015-01-04 1.0 b 0
2015-01-05 NaN b 1
2015-01-06 NaN b 2
2015-01-07 NaN b 3
说明:
首先用值
days
创建一个新列df.index
,其中maint
是1
,另一个值是NaT
用
index
创建的新系列减去GroupBy.ffill
,将NaN
替换为0 timedelta
,最后用Series.dt.days
将它们转换为天关于python - 计算自上次维护以来的日期差的有效方法是什么?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55085235/