尝试将here中的方法应用于多索引数据框似乎不起作用。
取一个数据框:
import pandas as pd
import numpy as np
dates = pd.date_range('20070101',periods=3200)
df = pd.DataFrame(data=np.random.randint(0,100,(3200,1)), columns =list('A'))
df['A'][5,6,7, 8, 9, 10, 11, 12, 13] = np.nan #add missing data points
df['date'] = dates
df = df[['date','A']]
将季节函数应用于日期时间索引
def get_season(row):
if row['date'].month >= 3 and row['date'].month <= 5:
return '2'
elif row['date'].month >= 6 and row['date'].month <= 8:
return '3'
elif row['date'].month >= 9 and row['date'].month <= 11:
return '4'
else:
return '1'
应用功能
df['Season'] = df.apply(get_season, axis=1)
创建“年份”列以建立索引
df['Year'] = df['date'].dt.year
年份和季节的多指标
df = df.set_index(['Year', 'Season'], inplace=False)
计算每个季节的数据点
count = df.groupby(level=[0, 1]).count()
删除少于75天的季节
count = count.drop(count[count.A < 75].index)
为超过75天的季节创建变量
complete = count[count['A'] >= 75].index
使用isin函数对所有内容都设置为false,而我希望它选择“ A”中具有超过75天有效数据的所有季节
df = df.isin(complete)
df
每个值都是错误的,我不知道为什么。
我希望这足够简洁,我需要使用季节才能将其用于多指标,所以我将其包括在内!
编辑
here中基于多索引重新索引编制的另一种方法不起作用(这也会产生空白数据框)
df3 = df.reset_index().groupby('Year').apply(lambda x: x.set_index('Season').reindex(count,method='pad'))
编辑2
还尝试了这个
seasons = count[count['A'] >= 75].index
df = df[df['A'].isin(seasons)]
再次,空白输出
最佳答案
我认为您可以使用Index.isin
:
complete = count[count['A'] >= 75].index
idx = df.index.isin(complete)
print idx
[ True True True ..., False False False]
print df[idx]
date A
Year Season
2007 1 2007-01-01 24.0
1 2007-01-02 92.0
1 2007-01-03 54.0
1 2007-01-04 91.0
1 2007-01-05 91.0
1 2007-01-06 NaN
1 2007-01-07 NaN
1 2007-01-08 NaN
1 2007-01-09 NaN
1 2007-01-10 NaN
1 2007-01-11 NaN
1 2007-01-12 NaN
1 2007-01-13 NaN
1 2007-01-14 NaN
1 2007-01-15 18.0
1 2007-01-16 82.0
1 2007-01-17 55.0
1 2007-01-18 64.0
1 2007-01-19 89.0
1 2007-01-20 37.0
1 2007-01-21 45.0
1 2007-01-22 4.0
1 2007-01-23 34.0
1 2007-01-24 35.0
1 2007-01-25 90.0
1 2007-01-26 17.0
1 2007-01-27 29.0
1 2007-01-28 58.0
1 2007-01-29 7.0
1 2007-01-30 57.0
... ... ...
2015 3 2015-08-02 42.0
3 2015-08-03 0.0
3 2015-08-04 31.0
3 2015-08-05 39.0
3 2015-08-06 25.0
3 2015-08-07 1.0
3 2015-08-08 7.0
3 2015-08-09 97.0
3 2015-08-10 38.0
3 2015-08-11 59.0
3 2015-08-12 28.0
3 2015-08-13 84.0
3 2015-08-14 43.0
3 2015-08-15 63.0
3 2015-08-16 68.0
3 2015-08-17 0.0
3 2015-08-18 19.0
3 2015-08-19 61.0
3 2015-08-20 11.0
3 2015-08-21 84.0
3 2015-08-22 75.0
3 2015-08-23 37.0
3 2015-08-24 40.0
3 2015-08-25 66.0
3 2015-08-26 50.0
3 2015-08-27 74.0
3 2015-08-28 37.0
3 2015-08-29 19.0
3 2015-08-30 25.0
3 2015-08-31 15.0
[3106 rows x 2 columns]
关于python - 从多索引数据框( Pandas )中删除不完整的季节,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/36361606/