在下图中,我想为每个阶段的特定work_item设置时差
print (df)
timestamp work_item from_phase to_phase
0 1/2/2015 14:39 WI_000001 Start Analyze
1 1/5/2015 11:48 WI_000001 Analyze Design
2 1/5/2015 12:35 WI_000001 Design Analyze
3 1/7/2015 11:04 WI_000001 Analyze Deploy
4 1/27/2015 11:36 WI_000001 Deploy End
5 1/2/2015 15:04 WI_000002 Start Analyze
6 1/14/2015 9:46 WI_000002 Analyze Design
7 1/14/2015 9:46 WI_000002 Design Build
8 1/14/2015 9:46 WI_000002 Build End
最佳答案
如果总是下一行从每个组的前一个from_phase
和to_phase
开始,则解决方案将起作用。
首先转换列to_datetime
,并按DataFrameGroupBy.diff
为每个组的差异创建新列。
然后按NaN
删除每组第一个dropna
行,聚合sum
,将timedeltas转换为total_seconds
,最后添加reset_index
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['diff'] = df.groupby('work_item')['timestamp'].diff()
df = (df.dropna(subset=['diff'])
.groupby(['work_item','from_phase'])['diff']
.sum()
.dt.total_seconds()
.astype(int)
.reset_index(name='sum of differencies'))
print (df)
work_item from_phase sum of differencies
0 WI_000001 Analyze 416280
1 WI_000001 Deploy 1729920
2 WI_000001 Design 2820
3 WI_000002 Analyze 1017720
4 WI_000002 Build 0
5 WI_000002 Design 0
关于python - 如何为特定work_item的不同阶段设置时差,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54861102/