本文介绍了重采样 pandas MultiIndex数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个类似于以下内容的pandas MultiIndex数据框:
I have a pandas MultiIndex dataframe similar to the following:
import pandas as pd
rows = [('One', 'One', 'One', '20120105', 1, 'Text1'),
('One', 'One', 'One', '20120107', 2, 'Text2'),
('One', 'One', 'One', '20120110', 3, 'Text3'),
('One', 'One', 'Two', '20120104', 4, 'Text4'),
('One', 'Two', 'One', '20120109', 5, 'Text5'),
('Two', 'Three', 'Four', '20120111', 6, 'Text6')]
cols = ['Type', 'Subtype', 'Subsubtype', 'Date', 'Number', 'Text']
df = pd.DataFrame.from_records(rows, columns=cols)
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index(['Type', 'Subtype', 'Subsubtype'])
end_date = max(df['Date'])
print(df)
Date Number Text
Type Subtype Subsubtype
One One One 2012-01-05 1 Text1
One 2012-01-07 2 Text2
One 2012-01-10 3 Text3
Two 2012-01-04 4 Text4
Two One 2012-01-09 5 Text5
Two Three Four 2012-01-11 6 Text6
我想对数据进行上采样,以使Type-Subtype-Subsubtype索引的每个组合都获得每日数据:从可用数据的最小日期到end_date = max(df ['Date']).
I would like to upsample the data so that each combination of the Type-Subtype-Subsubtype indexes gets daily date data: from the minimum date for which data is available to end_date = max(df['Date']).
我想要的例子:
Date Number Text
Type Subtype Subsubtype
One One One 2012-01-05 1 Text1
One 2012-01-06 1 Text2
One 2012-01-07 2 Text2
One 2012-01-08 2 Text2
One 2012-01-09 2 Text2
One 2012-01-10 3 Text3
One 2012-01-11 3 Text3
Two 2012-01-04 4 Text4
Two 2012-01-05 4 Text4
Two 2012-01-06 4 Text4
Two 2012-01-07 4 Text4
Two 2012-01-08 4 Text4
Two 2012-01-09 4 Text4
Two 2012-01-10 4 Text4
Two 2012-01-11 4 Text4
Two One 2012-01-09 5 Text5
One 2012-01-10 5 Text5
One 2012-01-11 5 Text5
Two Three Four 2012-01-11 6 Text6
通过类似的问题,我一直找不到能使我工作的东西.任何帮助将不胜感激.
Looking through similar questions I haven't been able to find anything that I could make work. Any help is greatly appreciated.
推荐答案
您可以使用:
-
groupby
Multiindex
的水平 -
apply
reindex
href ="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.date_range.html" rel ="nofollow noreferrer">date_range
与iat
用于选择第一个值 - 通过
ffill
groupby
by all levels ofMultiindex
apply
reindex
bydate_range
withiat
for select first value- replace
NaN
byffill
df = df.groupby(level=[0,1,2]) \
.apply(lambda x: x.set_index('Date').reindex(pd.date_range(x['Date'].iat[0],
end_date))).ffill()
print (df)
Number Text
Type Subtype Subsubtype
One One One 2012-01-05 1.0 Text1
2012-01-06 1.0 Text1
2012-01-07 2.0 Text2
2012-01-08 2.0 Text2
2012-01-09 2.0 Text2
2012-01-10 3.0 Text3
2012-01-11 3.0 Text3
Two 2012-01-04 4.0 Text4
2012-01-05 4.0 Text4
2012-01-06 4.0 Text4
2012-01-07 4.0 Text4
2012-01-08 4.0 Text4
2012-01-09 4.0 Text4
2012-01-10 4.0 Text4
2012-01-11 4.0 Text4
Two One 2012-01-09 5.0 Text5
2012-01-10 5.0 Text5
2012-01-11 5.0 Text5
Two Three Four 2012-01-11 6.0 Text6
这篇关于重采样 pandas MultiIndex数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!