我有一个数据框(szen_df
),然后选择我分配给另一个数据框(orb_df
)的数据框的一部分。当我尝试获取子选定数据帧的索引时,它仍然具有原始数据帧的整个索引。我想获取新数据框的0级索引。例如。
start = datetime(2007, 1, 25, 12, 49, 0)
end = datetime(2007, 1, 25, 14, 30, 0)
orb_df = szen_df.loc[start:end]
orb_df
显示:如果我查询新数据框的索引,则它具有旧数据框的所有日期。
orb_df.index.levels[0]
显示:
DatetimeIndex(['2007-01-25 00:00:00', '2007-01-25 00:10:00',
'2007-01-25 00:20:00', '2007-01-25 00:30:00',
'2007-01-25 00:40:00', '2007-01-25 00:50:00',
'2007-01-25 01:00:00', '2007-01-25 01:10:00',
'2007-01-25 01:20:00', '2007-01-25 01:30:00',
...
'2007-01-25 22:20:00', '2007-01-25 22:30:00',
'2007-01-25 22:40:00', '2007-01-25 22:50:00',
'2007-01-25 23:00:00', '2007-01-25 23:10:00',
'2007-01-25 23:20:00', '2007-01-25 23:30:00',
'2007-01-25 23:40:00', '2007-01-25 23:50:00'],
dtype='datetime64[ns]', name=u'time', length=144, freq=None, tz=None)
有144个元素。根据子选择,它应该仅包含11个元素。我需要以
2007-01-25 12:50:00
开始并以2007-01-25 14:30:00
结尾的索引。换句话说,我只想获得新子选择的0级索引。 最佳答案
这是一种方法。首先reset_index(level='pos')
分解多级索引,然后使用set_index('pos', append=True)
重建多级索引。
import pandas as pd
import numpy as np
# simulate your data
np.random.seed(0)
multi_index = pd.MultiIndex.from_product([pd.date_range('2007-02-01 00:00:00', periods=100, freq='10min'), ['left', 'center', 'right']], names=['time', 'pos'])
szen_df = pd.DataFrame(np.random.randn(300, 3), index=multi_index, columns=['lat', 'lon', 'szen'])
Out[48]:
lat lon szen
time pos
2007-02-01 00:00:00 left 1.7641 0.4002 0.9787
center 2.2409 1.8676 -0.9773
right 0.9501 -0.1514 -0.1032
2007-02-01 00:10:00 left 0.4106 0.1440 1.4543
center 0.7610 0.1217 0.4439
right 0.3337 1.4941 -0.2052
2007-02-01 00:20:00 left 0.3131 -0.8541 -2.5530
center 0.6536 0.8644 -0.7422
right 2.2698 -1.4544 0.0458
2007-02-01 00:30:00 left -0.1872 1.5328 1.4694
center 0.1549 0.3782 -0.8878
right -1.9808 -0.3479 0.1563
2007-02-01 00:40:00 left 1.2303 1.2024 -0.3873
center -0.3023 -1.0486 -1.4200
right -1.7063 1.9508 -0.5097
... ... ... ...
2007-02-01 15:50:00 left -0.4367 -1.6430 -0.4061
center -0.5353 0.0254 1.1542
right 0.1725 0.0211 0.0995
2007-02-01 16:00:00 left 0.2274 -1.0167 -0.1148
center 0.3088 -1.3708 0.8657
right 1.0814 -0.6314 -0.2413
2007-02-01 16:10:00 left -0.8782 0.6994 -1.0612
center -0.2225 -0.8589 0.0510
right -1.7942 1.3265 -0.9646
2007-02-01 16:20:00 left 0.0599 -0.2125 -0.7621
center -0.8878 0.9364 -0.5256
right 0.2712 -0.8015 -0.6472
2007-02-01 16:30:00 left 0.4722 0.9304 -0.1753
center -1.4219 1.9980 -0.8565
right -1.5416 2.5944 -0.4040
[300 rows x 3 columns]
start_time = '2007-02-01 12:50:00'
end_time = '2007-02-01 14:30:00'
orb_df = szen_df.reset_index(level='pos').loc[start_time:end_time].set_index('pos', append=True)
orb_df.index
Out[50]:
MultiIndex(levels=[[2007-02-01 12:50:00, 2007-02-01 13:00:00, 2007-02-01 13:10:00, 2007-02-01 13:20:00, 2007-02-01 13:30:00, 2007-02-01 13:40:00, 2007-02-01 13:50:00, 2007-02-01 14:00:00, 2007-02-01 14:10:00, 2007-02-01 14:20:00, 2007-02-01 14:30:00], ['center', 'left', 'right']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10], [1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2]],
names=['time', 'pos'])
orb_df.index.levels[0]
Out[59]:
DatetimeIndex(['2007-02-01 12:50:00', '2007-02-01 13:00:00',
'2007-02-01 13:10:00', '2007-02-01 13:20:00',
'2007-02-01 13:30:00', '2007-02-01 13:40:00',
'2007-02-01 13:50:00', '2007-02-01 14:00:00',
'2007-02-01 14:10:00', '2007-02-01 14:20:00',
'2007-02-01 14:30:00'],
dtype='datetime64[ns]', name='time', freq=None, tz=None)