我有一个数据框(szen_df),然后选择我分配给另一个数据框(orb_df)的数据框的一部分。当我尝试获取子选定数据帧的索引时,它仍然具有原始数据帧的整个索引。我想获取新数据框的0级索引。例如。

start = datetime(2007, 1, 25, 12, 49, 0)
end = datetime(2007, 1, 25, 14, 30, 0)
orb_df = szen_df.loc[start:end]


orb_df显示:



如果我查询新数据框的索引,则它具有旧数据框的所有日期。

orb_df.index.levels[0]


显示:

DatetimeIndex(['2007-01-25 00:00:00', '2007-01-25 00:10:00',
           '2007-01-25 00:20:00', '2007-01-25 00:30:00',
           '2007-01-25 00:40:00', '2007-01-25 00:50:00',
           '2007-01-25 01:00:00', '2007-01-25 01:10:00',
           '2007-01-25 01:20:00', '2007-01-25 01:30:00',
           ...
           '2007-01-25 22:20:00', '2007-01-25 22:30:00',
           '2007-01-25 22:40:00', '2007-01-25 22:50:00',
           '2007-01-25 23:00:00', '2007-01-25 23:10:00',
           '2007-01-25 23:20:00', '2007-01-25 23:30:00',
           '2007-01-25 23:40:00', '2007-01-25 23:50:00'],
          dtype='datetime64[ns]', name=u'time', length=144, freq=None, tz=None)


有144个元素。根据子选择,它应该仅包含11个元素。我需要以2007-01-25 12:50:00开始并以2007-01-25 14:30:00结尾的索引。换句话说,我只想获得新子选择的0级索引。

最佳答案

这是一种方法。首先reset_index(level='pos')分解多级索引,然后使用set_index('pos', append=True)重建多级索引。

import pandas as pd
import numpy as np

# simulate your data
np.random.seed(0)
multi_index = pd.MultiIndex.from_product([pd.date_range('2007-02-01 00:00:00', periods=100, freq='10min'), ['left', 'center', 'right']], names=['time', 'pos'])

szen_df = pd.DataFrame(np.random.randn(300, 3), index=multi_index, columns=['lat', 'lon', 'szen'])


Out[48]:
                               lat     lon    szen
time                pos
2007-02-01 00:00:00 left    1.7641  0.4002  0.9787
                    center  2.2409  1.8676 -0.9773
                    right   0.9501 -0.1514 -0.1032
2007-02-01 00:10:00 left    0.4106  0.1440  1.4543
                    center  0.7610  0.1217  0.4439
                    right   0.3337  1.4941 -0.2052
2007-02-01 00:20:00 left    0.3131 -0.8541 -2.5530
                    center  0.6536  0.8644 -0.7422
                    right   2.2698 -1.4544  0.0458
2007-02-01 00:30:00 left   -0.1872  1.5328  1.4694
                    center  0.1549  0.3782 -0.8878
                    right  -1.9808 -0.3479  0.1563
2007-02-01 00:40:00 left    1.2303  1.2024 -0.3873
                    center -0.3023 -1.0486 -1.4200
                    right  -1.7063  1.9508 -0.5097
...                            ...     ...     ...
2007-02-01 15:50:00 left   -0.4367 -1.6430 -0.4061
                    center -0.5353  0.0254  1.1542
                    right   0.1725  0.0211  0.0995
2007-02-01 16:00:00 left    0.2274 -1.0167 -0.1148
                    center  0.3088 -1.3708  0.8657
                    right   1.0814 -0.6314 -0.2413
2007-02-01 16:10:00 left   -0.8782  0.6994 -1.0612
                    center -0.2225 -0.8589  0.0510
                    right  -1.7942  1.3265 -0.9646
2007-02-01 16:20:00 left    0.0599 -0.2125 -0.7621
                    center -0.8878  0.9364 -0.5256
                    right   0.2712 -0.8015 -0.6472
2007-02-01 16:30:00 left    0.4722  0.9304 -0.1753
                    center -1.4219  1.9980 -0.8565
                    right  -1.5416  2.5944 -0.4040

[300 rows x 3 columns]

start_time = '2007-02-01 12:50:00'
end_time = '2007-02-01 14:30:00'
orb_df = szen_df.reset_index(level='pos').loc[start_time:end_time].set_index('pos', append=True)

orb_df.index

Out[50]:
MultiIndex(levels=[[2007-02-01 12:50:00, 2007-02-01 13:00:00, 2007-02-01 13:10:00, 2007-02-01 13:20:00, 2007-02-01 13:30:00, 2007-02-01 13:40:00, 2007-02-01 13:50:00, 2007-02-01 14:00:00, 2007-02-01 14:10:00, 2007-02-01 14:20:00, 2007-02-01 14:30:00], ['center', 'left', 'right']],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10], [1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2]],
           names=['time', 'pos'])


orb_df.index.levels[0]

Out[59]:
DatetimeIndex(['2007-02-01 12:50:00', '2007-02-01 13:00:00',
               '2007-02-01 13:10:00', '2007-02-01 13:20:00',
               '2007-02-01 13:30:00', '2007-02-01 13:40:00',
               '2007-02-01 13:50:00', '2007-02-01 14:00:00',
               '2007-02-01 14:10:00', '2007-02-01 14:20:00',
               '2007-02-01 14:30:00'],
              dtype='datetime64[ns]', name='time', freq=None, tz=None)

10-07 13:28
查看更多