问题描述
我正在尝试使用具有相似列值的行来估算/填充值.
I am trying to impute/fill values using rows with similar columns' values.
例如,我有这个数据框:
For example, I have this dataframe:
one | two | three
1 1 10
1 1 nan
1 1 nan
1 2 nan
1 2 20
1 2 nan
1 3 nan
1 3 nan
我想使用 one
和 two
列的键,它们是相似的,如果 three
列不完全是 nan,则估算现有的来自一行相似键的值,其值在列3"中.
I wanted to using the keys of column one
and two
which is similar and if column three
is not entirely nan then impute the existing value from a row of similar keys with value in column '3'.
这是我想要的结果:
one | two | three
1 1 10
1 1 10
1 1 10
1 2 20
1 2 20
1 2 20
1 3 nan
1 3 nan
您可以看到键 1 和 3 不包含任何值,因为现有值不存在.
You can see that keys 1 and 3 do not contain any value because the existing value does not exists.
我尝试过使用 groupby
+fillna()
:
df['three'] = df.groupby(['one','two'])['three'].fillna()
这给了我一个错误.
我尝试过向前填充,这给了我相当奇怪的结果,它向前填充第 2 列.我正在使用此代码进行前向填充.
I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.
df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill()
推荐答案
如果每组只有一个非 NaN 值使用 ffill
(向前填充)和 bfill
(向后填充)) 每组,所以需要 apply
和 lambda
:
If only one non NaN value per group use ffill
(forward filling) and bfill
(backward filling) per group, so need apply
with lambda
:
df['three'] = df.groupby(['one','two'], sort=False)['three']
.apply(lambda x: x.ffill().bfill())
print (df)
one two three
0 1 1 10.0
1 1 1 10.0
2 1 1 10.0
3 1 2 20.0
4 1 2 20.0
5 1 2 20.0
6 1 3 NaN
7 1 3 NaN
但是如果每组有多个值并且需要用一些常量替换 NaN
- 例如mean
按组:
But if multiple value per group and need replace NaN
by some constant - e.g. mean
by group:
print (df)
one two three
0 1 1 10.0
1 1 1 40.0
2 1 1 NaN
3 1 2 NaN
4 1 2 20.0
5 1 2 NaN
6 1 3 NaN
7 1 3 NaN
df['three'] = df.groupby(['one','two'], sort=False)['three']
.apply(lambda x: x.fillna(x.mean()))
print (df)
one two three
0 1 1 10.0
1 1 1 40.0
2 1 1 25.0
3 1 2 20.0
4 1 2 20.0
5 1 2 20.0
6 1 3 NaN
7 1 3 NaN
这篇关于使用 groupby 填充 Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!