在下图中,我想为每个阶段的特定work_item设置时差

print (df)
         timestamp  work_item from_phase to_phase
0   1/2/2015 14:39  WI_000001      Start  Analyze
1   1/5/2015 11:48  WI_000001    Analyze   Design
2   1/5/2015 12:35  WI_000001     Design  Analyze
3   1/7/2015 11:04  WI_000001    Analyze   Deploy
4  1/27/2015 11:36  WI_000001     Deploy      End
5   1/2/2015 15:04  WI_000002      Start  Analyze
6   1/14/2015 9:46  WI_000002    Analyze   Design
7   1/14/2015 9:46  WI_000002     Design    Build
8   1/14/2015 9:46  WI_000002      Build      End

最佳答案

如果总是下一行从每个组的前一个from_phaseto_phase开始,则解决方案将起作用。

首先转换列to_datetime,并按DataFrameGroupBy.diff为每个组的差异创建新列。

然后按NaN删除每组第一个dropna行,聚合sum,将timedeltas转换为total_seconds,最后添加reset_index

df['timestamp'] = pd.to_datetime(df['timestamp'])
df['diff'] = df.groupby('work_item')['timestamp'].diff()

df = (df.dropna(subset=['diff'])
        .groupby(['work_item','from_phase'])['diff']
        .sum()
        .dt.total_seconds()
        .astype(int)
        .reset_index(name='sum of differencies'))
print (df)

   work_item from_phase  sum of differencies
0  WI_000001    Analyze               416280
1  WI_000001     Deploy              1729920
2  WI_000001     Design                 2820
3  WI_000002    Analyze              1017720
4  WI_000002      Build                    0
5  WI_000002     Design                    0

关于python - 如何为特定work_item的不同阶段设置时差,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54861102/

10-12 18:37