问题描述
我想根据情况在 df2
中的时间间隔StartTime
和EndTime
之间对df1
中的行进行切片(通过 df2
).然后,将多个切片的格式相同,将它们串联在一起.
I want to slice rows in df1
in between time intervals StartTime
and EndTime
in df2
on a case by case basis(by values in column Group_Id
in df2
). Then concatenate the multiple slices together given they are of the same formats.
所以这是df1
:
Timestamp Group_Id Data
2013-10-20 00:00:05.143 11 14
2013-10-21 00:05:10.377 11 15
2013-10-22 14:22:15.501 11 19
...
2019-02-05 00:00:05.743 101 21
2019-02-10 00:00:10.407 101 33
和df2
:
EndTime StartTime Group_Id
27/10/13 16:08 20/10/13 16:08 11
03/12/16 16:11 26/11/16 16:11 2
24/10/14 12:08 17/10/14 12:08 11
04/07/17 08:00 27/06/17 08:00 100
03/04/13 14:10 27/03/13 14:10 26
15/11/18 17:00 08/11/18 17:00 46
11/02/19 00:20 04/02/19 00:20 101
第一步:我们从列Group_Id
,df2
中的第一行开始:11
Step1: We start from first row in column Group_Id
,df2
: 11
第2步:复制和在 df1
中粘贴位于EndTime
&之间的相应行. StartTime
表示Group_Id==11
Step2:Copy & Paste corresponding rows in df1
that lie between EndTime
& StartTime
for Group_Id==11
Step3:合并来自Group_Id
(df2
)
Step3: Concat all sliced subsets from each row in Group_Id
(df2
)
希望最终数据集df3
看起来像这样:
Hopefully final dataset df3
looks like this:
Group_Id EndTime StartTime Timestamp Data
11 27/10/13 16:08 20/10/13 16:08 2013-10-20 20:00:05.143 14
11 27/10/13 16:08 20/10/13 16:08 2013-10-21 00:05:10.377 15
11 27/10/13 16:08 20/10/13 16:08 2013-10-22 14:22:15.501 19
...
101 11/02/19 00:20 04/02/19 00:20 2019-02-05 00:00:05.743 21
101 11/02/19 00:20 04/02/19 00:20 2019-02-10 00:00:10.407 33
...
错误的伪代码:
for i in df2['Group_Id']:
if i = df1['Group_Id'],
dfxx = df1[(df1['Timestamp'] <= df2.loc[i, 'EndTime']) & df1['Timestamp'] > (df2.loc['EndTime'] - dt.timedelta(days=7)])
pd.concat(dfxx for all i)
i = i+1
希望这有助于更好地说明问题.
Hope this helps to better illustrate the problem.
推荐答案
将df1.Timestamp
转换为日期时间.在Group_Id
上合并.从df3
的start
和end
创建IntervalIndex
.使用listcomp创建True/False掩码m
和切片df3
.
Convert df1.Timestamp
to datetime. Merge on Group_Id
. Create IntervalIndex
from start
and end
of df3
. Use listcomp to create True/False mask m
and slice df3
.
df1.Timestamp = pd.to_datetime(df1.Timestamp)
df3 = df2.merge(df1, on='Group_Id')
iix = pd.IntervalIndex.from_tuples([*df3[['StartTime','EndTime']].apply(pd.to_datetime, dayfirst=True).to_records(index=False)],
closed='both')
m = [x in iix[i] for i, x in enumerate(df3.Timestamp)]
df3.loc[m]
Out[494]:
EndTime StartTime Group_Id Timestamp Data
0 27/10/13 16:08 20/10/13 16:08 11 2013-10-20 20:00:05.143 14
1 27/10/13 16:08 20/10/13 16:08 11 2013-10-21 00:05:10.377 15
2 27/10/13 16:08 20/10/13 16:08 11 2013-10-22 14:22:15.501 19
6 11/02/19 00:20 04/02/19 00:20 101 2019-02-05 00:00:05.743 21
7 11/02/19 00:20 04/02/19 00:20 101 2019-02-10 00:00:10.407 33
这篇关于如何在逐个案例的基础上在时间间隔之间分割行python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!