问题描述
问题:让我们从Kaggle获取Titanic数据集。
我有带有 Pclass, Sex和 Age列的数据框。
我需要使用某些组的中位数填充年龄列中的NaN。
如果是一等舱的女性,我想用一等舱女性的中位数而不是整个年龄列的中位数来填充她的年龄。
The problem: let us take Titanic dataset from Kaggle.I have dataframe with columns "Pclass", "Sex" and "Age".I need to fill NaN in column "Age" with a median for certain group.If it is a woman from 1st class, I would like to fill her age with the median for 1st class women, not with the median for whole Age column.
问题是如何在特定范围内进行更改?
The question is how to make this change in a certain slice?
我尝试过:
data['Age'][(data['Sex'] == 'female')&(data['Pclass'] == 1)&(data['Age'].isnull())].fillna(median)
其中 median是我的值,但没有更改 inplace = True没有帮助。
where the "median" is my value, but nothing changes "inplace=True" didn't help.
非常感谢!
推荐答案
我相信您需要按掩码过滤并分配回去:
I believe you need filter by masks and assign back:
data = pd.DataFrame({'a':list('aaaddd'),
'Sex':['female','female','male','female','female','male'],
'Pclass':[1,2,1,2,1,1],
'Age':[40,20,30,20,np.nan,np.nan]})
print (data)
Age Pclass Sex a
0 40.0 1 female a
1 20.0 2 female a
2 30.0 1 male a
3 20.0 2 female d
4 NaN 1 female d
5 NaN 1 male d
#boolean mask
mask1 = (data['Sex'] == 'female')&(data['Pclass'] == 1)
#get median by mask without NaNs
med = data.loc[mask1, 'Age'].median()
print (med)
40.0
#repalce NaNs
data.loc[mask1, 'Age'] = data.loc[mask1, 'Age'].fillna(med)
print (data)
Age Pclass Sex a
0 40.0 1 female a
1 20.0 2 female a
2 30.0 1 male a
3 20.0 2 female d
4 40.0 1 female d
5 NaN 1 male d
什么意思:
mask2 = mask1 &(data['Age'].isnull())
data.loc[mask2, 'Age'] = med
print (data)
Age Pclass Sex a
0 40.0 1 female a
1 20.0 2 female a
2 30.0 1 male a
3 20.0 2 female d
4 40.0 1 female d
5 NaN 1 male d
编辑:
如果需要,用中位数替换所有组 NaN
s:
If need replace all groups NaN
s by median:
data['Age'] = data.groupby(["Sex","Pclass"])["Age"].apply(lambda x: x.fillna(x.median()))
print (data)
Age Pclass Sex a
0 40.0 1 female a
1 20.0 2 female a
2 30.0 1 male a
3 20.0 2 female d
4 40.0 1 female d
5 30.0 1 male d
这篇关于如何制作DataFrame和“ fillna”切片在特定切片中使用Python Pandas?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!