我有以下字典:

dic = {'T1':["2013-11-12 17:35:00", "2013-11-12 17:36:00", "2013-11-12 17:37:00", "2013-11-12 17:38:00",
               "2013-11-12 17:40:00", "2013-11-12 17:41:00", "2013-11-12 17:42:00"], 'T2':["2013-11-12 12:15:00", "2013-11-12 12:16:00", "2013-11-13 16:32:00", "2013-11-13 16:33:00",
               "2013-11-13 16:34:00"]}


我想从中生成以下multiIndexed数据帧:

                      T1                                            T2
         Start                   Stop                   Start                Stop
   2013-11-12 17:35:00  2013-11-12 17:38:00     2013-11-12 12:15:00  2013-11-12 12:16:00
   2013-11-12 17:40:00  2013-11-12 17:42:00     2013-11-13 16:32:00  2013-11-13 16:34:00


数据帧描述的是传感器T1或T2的某些事件开始和结束的时间。如果两个事件之间的时间差小于1分钟,则我认为这是同一事件继续发生,而当该差异大于1分钟时,则表示新事件开始。

感谢您的帮助:)

最佳答案

您可以计算连续时间戳之间的差异,并在差异不是1分钟时形成一个True掩码:

df['mask'] = (df[key].diff() / np.timedelta64(1, 'm')) != 1


然后以掩码的总和来标识哪些行属于哪个组:

df['group'] = df['mask'].cumsum()


产生类似:

                   T2   mask  group
0 2013-11-12 12:15:00   True      1
1 2013-11-12 12:16:00  False      1
2 2013-11-13 16:32:00   True      2
3 2013-11-13 16:33:00  False      2
4 2013-11-13 16:34:00  False      2

                   T1  mask  group
0 2013-11-12 17:38:00  True      1
1 2013-11-12 17:40:00  True      2
2 2013-11-12 17:42:00  True      3


现在按group列进行分组,并为每个组找到第一个和最后一个时间戳:

result[key] = df.groupby(['group'])[key].agg(['first', 'last'])




import numpy as np
import pandas as pd
pd.options.display.width = 1000
dic = {'T1':["2013-11-12 17:35:00", "2013-11-12 17:36:00", "2013-11-12 17:37:00",
             "2013-11-12 17:38:00", "2013-11-12 17:40:00", "2013-11-12 17:41:00",
             "2013-11-12 17:42:00"],
       'T2':["2013-11-12 12:15:00", "2013-11-12 12:16:00", "2013-11-13 16:32:00",
             "2013-11-13 16:33:00", "2013-11-13 16:34:00"]}

result = dict()
for key, val in dic.items():
    df = pd.DataFrame({key: pd.to_datetime(val)})
    df['mask'] = (df[key].diff() / np.timedelta64(1, 'm')) != 1
    df['group'] = df['mask'].cumsum()
    result[key] = df.groupby(['group'])[key].agg(['first', 'last'])
    result[key] = result[key].rename(columns={'first':'Start', 'last':'Stop'})
result = pd.concat(result, axis=1)
print(result)


产量

                       T1                                      T2
                    Start                Stop               Start                Stop
group
1     2013-11-12 17:35:00 2013-11-12 17:38:00 2013-11-12 12:15:00 2013-11-12 12:16:00
2     2013-11-12 17:40:00 2013-11-12 17:42:00 2013-11-13 16:32:00 2013-11-13 16:34:00

关于python - 从具有不同长度值的字典中生成多索引数据框,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/32610129/

10-12 22:02
查看更多