我有DatetimeIndex
个对象的集合,例如
DatetimeIndex(['2007-11-01 00:00:00', '2008-01-01 00:00:00',
'2008-02-01 00:00:00', '2008-03-01 00:00:00',
'2008-04-01 00:00:00', '2012-09-01 00:10:00',
'2012-09-01 00:20:00', '2012-09-01 00:30:00',
'2012-09-01 00:40:00', '2012-09-01 00:50:00',
...
'2012-09-30 22:40:00', '2012-09-30 22:50:00',
'2012-09-30 23:00:00', '2012-09-30 23:10:00',
'2012-09-30 23:20:00', '2012-09-30 23:30:00',
'2012-09-30 23:40:00', '2012-09-30 23:50:00',
'2012-10-01 00:00:00', '2015-07-01 00:00:00'],
dtype='datetime64[ns]', length=4326, freq=None, tz=None)
我认为它的
freq
和inferred_freq
都是None
,因为尽管数据实际上有10分钟的时间,但由于缺少部分而无法检测到。我想尽可能高效地提取这些缺失的部分,或者等效地,可用的部分。即,我想得到一些诸如以下范围列表:[('2007-11-01 00:00:00', '2007-11-01 00:00:00'),
('2008-01-01 00:00:00', '2008-01-01 00:00:00'),
('2008-02-01 00:00:00', '2008-02-01 00:00:00'),
('2008-03-01 00:00:00', '2008-03-01 00:00:00'),
('2008-04-01 00:00:00', '2008-04-01 00:00:00'),
('2012-09-01 00:10:00', '2012-10-01 00:00:00'),
('2015-07-01 00:00:00', '2015-07-01 00:00:00')]
我应该怎么做呢?我看过
PeriodIndex
,但这似乎是针对不同类型的应用程序,或者也许还没有处理任意时间间隔。 最佳答案
我认为您可以按系列groupby
使用grouper
并汇总min
和max
:
通过将grouper
与difference
和10 minute
进行比较来创建cumsum
。
rng = pd.DatetimeIndex(['2007-11-01 00:00:00', '2008-01-01 00:00:00',
'2008-02-01 00:00:00', '2008-03-01 00:00:00',
'2008-04-01 00:00:00', '2012-09-01 00:10:00',
'2012-09-01 00:20:00', '2012-09-01 00:30:00',
'2012-09-01 00:40:00', '2012-09-01 00:50:00',
'2012-09-30 22:40:00', '2012-09-30 22:50:00',
'2012-09-30 23:00:00', '2012-09-30 23:10:00',
'2012-09-30 23:20:00', '2012-09-30 23:30:00',
'2012-09-30 23:40:00', '2012-09-30 23:50:00',
'2012-10-01 00:00:00', '2015-07-01 00:00:00'])
s = pd.Series(rng)
grouper = s.diff().ne(pd.to_timedelta('10min')).cumsum()
print (grouper)
0 1
1 2
2 3
3 4
4 5
5 6
6 6
7 6
8 6
9 6
10 7
11 7
12 8
13 8
14 8
15 8
16 8
17 8
18 8
19 9
dtype: int32
print (s.groupby(grouper).agg(['min', 'max']).astype(str).apply(tuple, axis=1).tolist())
[('2007-11-01 00:00:00', '2007-11-01 00:00:00'),
('2008-01-01 00:00:00', '2008-01-01 00:00:00'),
('2008-02-01 00:00:00', '2008-02-01 00:00:00'),
('2008-03-01 00:00:00', '2008-03-01 00:00:00'),
('2008-04-01 00:00:00', '2008-04-01 00:00:00'),
('2012-09-01 00:10:00', '2012-09-01 00:50:00'),
('2015-09-30 22:40:00', '2015-09-30 22:50:00'),
('2012-09-30 23:00:00', '2012-10-01 00:00:00'),
('2015-07-01 00:00:00', '2015-07-01 00:00:00')]