如何在逐个案例的基础上在时间间隔之间分割行python

本文介绍了如何在逐个案例的基础上在时间间隔之间分割行python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想根据情况在 df2 中的时间间隔StartTime和EndTime之间对df1中的行进行切片(通过 df2 ).然后，将多个切片的格式相同，将它们串联在一起.

I want to slice rows in df1 in between time intervals StartTime and EndTime in df2 on a case by case basis(by values in column Group_Id in df2). Then concatenate the multiple slices together given they are of the same formats.

所以这是df1:

      Timestamp           Group_Id      Data
2013-10-20 00:00:05.143    11           14
2013-10-21 00:05:10.377    11           15
2013-10-22 14:22:15.501    11           19
                   ...
2019-02-05 00:00:05.743    101          21
2019-02-10 00:00:10.407    101          33

和df2:

EndTime          StartTime             Group_Id
27/10/13 16:08   20/10/13 16:08          11
03/12/16 16:11   26/11/16 16:11          2
24/10/14 12:08   17/10/14 12:08          11
04/07/17 08:00   27/06/17 08:00          100
03/04/13 14:10   27/03/13 14:10          26
15/11/18 17:00   08/11/18 17:00          46
11/02/19 00:20   04/02/19 00:20          101

第一步:我们从列Group_Id，df2中的第一行开始:11

Step1: We start from first row in column Group_Id,df2: 11

第2步:复制和在 df1 中粘贴位于EndTime&之间的相应行. StartTime表示Group_Id==11

Step2:Copy & Paste corresponding rows in df1 that lie between EndTime & StartTime for Group_Id==11

Step3:合并来自Group_Id(df2)

Step3: Concat all sliced subsets from each row in Group_Id(df2)

希望最终数据集df3看起来像这样:

Hopefully final dataset df3 looks like this:

Group_Id EndTime         StartTime      Timestamp                 Data
11       27/10/13 16:08  20/10/13 16:08 2013-10-20 20:00:05.143   14
11       27/10/13 16:08  20/10/13 16:08 2013-10-21 00:05:10.377   15
11       27/10/13 16:08  20/10/13 16:08 2013-10-22 14:22:15.501   19
                             ...
101      11/02/19 00:20  04/02/19 00:20 2019-02-05 00:00:05.743   21
101      11/02/19 00:20  04/02/19 00:20 2019-02-10 00:00:10.407   33
                             ...

错误的伪代码:

for i in df2['Group_Id']:
  if i = df1['Group_Id'],
  dfxx = df1[(df1['Timestamp'] <= df2.loc[i, 'EndTime']) & df1['Timestamp'] > (df2.loc['EndTime'] - dt.timedelta(days=7)])                                                                   
  pd.concat(dfxx for all i)
  i = i+1

希望这有助于更好地说明问题.

Hope this helps to better illustrate the problem.

推荐答案

将df1.Timestamp转换为日期时间.在Group_Id上合并.从df3的start和end创建IntervalIndex.使用listcomp创建True/False掩码m和切片df3.

Convert df1.Timestamp to datetime. Merge on Group_Id. Create IntervalIndex from start and end of df3. Use listcomp to create True/False mask m and slice df3.

df1.Timestamp = pd.to_datetime(df1.Timestamp)
df3 = df2.merge(df1, on='Group_Id')
iix = pd.IntervalIndex.from_tuples([*df3[['StartTime','EndTime']].apply(pd.to_datetime, dayfirst=True).to_records(index=False)], 
                                   closed='both')
m = [x in iix[i] for i, x in enumerate(df3.Timestamp)]

df3.loc[m]

Out[494]:
          EndTime       StartTime  Group_Id               Timestamp  Data
0  27/10/13 16:08  20/10/13 16:08        11 2013-10-20 20:00:05.143    14
1  27/10/13 16:08  20/10/13 16:08        11 2013-10-21 00:05:10.377    15
2  27/10/13 16:08  20/10/13 16:08        11 2013-10-22 14:22:15.501    19
6  11/02/19 00:20  04/02/19 00:20       101 2019-02-05 00:00:05.743    21
7  11/02/19 00:20  04/02/19 00:20       101 2019-02-10 00:00:10.407    33

这篇关于如何在逐个案例的基础上在时间间隔之间分割行python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！