本文介绍了重采样 pandas MultiIndex数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似于以下内容的pandas MultiIndex数据框:

I have a pandas MultiIndex dataframe similar to the following:

import pandas as pd

rows = [('One', 'One', 'One', '20120105', 1, 'Text1'),
        ('One', 'One', 'One', '20120107', 2, 'Text2'),
        ('One', 'One', 'One', '20120110', 3, 'Text3'),
        ('One', 'One', 'Two', '20120104', 4, 'Text4'),
        ('One', 'Two', 'One', '20120109', 5, 'Text5'),
        ('Two', 'Three', 'Four', '20120111', 6, 'Text6')]
cols = ['Type', 'Subtype', 'Subsubtype', 'Date', 'Number', 'Text']
df = pd.DataFrame.from_records(rows, columns=cols)
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index(['Type', 'Subtype', 'Subsubtype'])
end_date = max(df['Date'])
print(df)

                              Date  Number   Text
Type Subtype Subsubtype
One  One     One        2012-01-05       1  Text1
             One        2012-01-07       2  Text2
             One        2012-01-10       3  Text3
             Two        2012-01-04       4  Text4
     Two     One        2012-01-09       5  Text5
Two  Three   Four       2012-01-11       6  Text6

我想对数据进行上采样,以使Type-Subtype-Subsubtype索引的每个组合都获得每日数据:从可用数据的最小日期到end_date = max(df ['Date']).

I would like to upsample the data so that each combination of the Type-Subtype-Subsubtype indexes gets daily date data: from the minimum date for which data is available to end_date = max(df['Date']).

我想要的例子:

                              Date  Number   Text
Type Subtype Subsubtype
One  One     One        2012-01-05       1  Text1
             One        2012-01-06       1  Text2
             One        2012-01-07       2  Text2
             One        2012-01-08       2  Text2
             One        2012-01-09       2  Text2
             One        2012-01-10       3  Text3
             One        2012-01-11       3  Text3
             Two        2012-01-04       4  Text4
             Two        2012-01-05       4  Text4
             Two        2012-01-06       4  Text4
             Two        2012-01-07       4  Text4
             Two        2012-01-08       4  Text4
             Two        2012-01-09       4  Text4
             Two        2012-01-10       4  Text4
             Two        2012-01-11       4  Text4
     Two     One        2012-01-09       5  Text5
             One        2012-01-10       5  Text5
             One        2012-01-11       5  Text5
Two  Three   Four       2012-01-11       6  Text6

通过类似的问题,我一直找不到能使我工作的东西.任何帮助将不胜感激.

Looking through similar questions I haven't been able to find anything that I could make work. Any help is greatly appreciated.

推荐答案

您可以使用:

  • groupby Multiindex
  • 的水平
  • apply reindex href ="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.date_range.html" rel ="nofollow noreferrer"> date_range iat 用于选择第一个值
  • 通过 ffill
  • groupby by all levels of Multiindex
  • apply reindex by date_range with iat for select first value
  • replace NaN by ffill
df = df.groupby(level=[0,1,2]) \
       .apply(lambda x: x.set_index('Date').reindex(pd.date_range(x['Date'].iat[0],
                                                                  end_date))).ffill()
print (df)
                                    Number   Text
Type Subtype Subsubtype
One  One     One        2012-01-05     1.0  Text1
                        2012-01-06     1.0  Text1
                        2012-01-07     2.0  Text2
                        2012-01-08     2.0  Text2
                        2012-01-09     2.0  Text2
                        2012-01-10     3.0  Text3
                        2012-01-11     3.0  Text3
             Two        2012-01-04     4.0  Text4
                        2012-01-05     4.0  Text4
                        2012-01-06     4.0  Text4
                        2012-01-07     4.0  Text4
                        2012-01-08     4.0  Text4
                        2012-01-09     4.0  Text4
                        2012-01-10     4.0  Text4
                        2012-01-11     4.0  Text4
     Two     One        2012-01-09     5.0  Text5
                        2012-01-10     5.0  Text5
                        2012-01-11     5.0  Text5
Two  Three   Four       2012-01-11     6.0  Text6

这篇关于重采样 pandas MultiIndex数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 01:22