我想保留最后几行,但是一旦在100ms以上有时间间隔,就切断其余的数据帧。例如:
输入:
Time X
0 12:30:00.00 A
1 12:30:00.100 B
2 12:30:00.202 C
3 12:30.00.300 D
输出量
Time X
2 12:30:00.202 C
3 12:30.00.300 D
说明:B行和C行之间的距离超过100毫秒,因此我们将C行上方的所有内容都丢弃了。
最佳答案
您可以使用diff
将Timedelta
与to_timedelta
进行比较,然后将cumsum
与1
进行比较。上次使用boolean indexing
:
df['Time']= pd.to_datetime(df['Time'], format='%H:%M:%S.%f')
print (df)
Time X
0 1900-01-01 12:30:00.000 A
1 1900-01-01 12:30:00.100 B
2 1900-01-01 12:30:00.202 C
3 1900-01-01 12:30:00.300 D
print (df.Time.diff())
0 NaT
1 00:00:00.100000
2 00:00:00.102000
3 00:00:00.098000
Name: Time, dtype: timedelta64[ns]
mask = (((df.Time.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()) >= 1)
print (mask)
0 False
1 False
2 True
3 True
Name: Time, dtype: bool
print (df[mask])
Time X
2 1900-01-01 12:30:00.202 C
3 1900-01-01 12:30:00.300 D
如果需要列
Time
不变,则将第一个值拆分为更高的100ms
:df['Time1']= pd.to_datetime(df['Time'], format='%H:%M:%S.%f')
print (df)
Time X Time1
0 12:30:00.00 A 1900-01-01 12:30:00.000
1 12:30:00.100 B 1900-01-01 12:30:00.100
2 12:30:00.202 C 1900-01-01 12:30:00.202
3 12:30:00.300 D 1900-01-01 12:30:00.300
1 12:30:00.100 E 1900-01-01 12:30:00.100
2 12:30:00.202 F 1900-01-01 12:30:00.202
print (df.Time1.diff())
0 NaT
1 00:00:00.100000
2 00:00:00.102000
3 00:00:00.098000
1 -1 days +23:59:59.800000
2 00:00:00.102000
Name: Time1, dtype: timedelta64[ns]
mask = (((df.Time1.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()) >= 1)
print (mask)
0 False
1 False
2 True
3 True
1 True
2 True
Name: Time1, dtype: bool
print (df[mask].drop('Time1',axis=1))
Time X
2 12:30:00.202 C
3 12:30:00.300 D
1 12:30:00.100 E
2 12:30:00.202 F
如果需要除以最后一个值:
print (df)
Time X
0 12:30:00.00 A
1 12:30:00.100 B
2 12:30:00.202 C
3 12:30:00.300 D
1 12:30:00.100 E
2 12:30:00.202 F
#create helper series
time_ser= pd.to_datetime(df['Time'], format='%H:%M:%S.%f')
#get differences
print (time_ser.diff())
0 NaT
1 00:00:00.100000
2 00:00:00.102000
3 00:00:00.098000
1 -1 days +23:59:59.800000
2 00:00:00.102000
Name: Time, dtype: timedelta64[ns]
#compare with 100ms timedalta
mask = (((time_ser.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()))
print (mask)
0 0
1 0
2 1
3 1
1 1
2 2
Name: Time, dtype: int32
#get last value of mask
last_val = mask.iat[-1]
print(last_val)
2
#compare mask with last value and use boolean indexing
print (df[mask == last_val])
Time X
2 12:30:00.202 F