本文介绍了使用 groupby 填充 Pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用具有相似列值的行来估算/填充值.

I am trying to impute/fill values using rows with similar columns' values.

例如,我有这个数据框:

For example, I have this dataframe:

one | two | three
1      1     10
1      1     nan
1      1     nan
1      2     nan
1      2     20
1      2     nan
1      3     nan
1      3     nan

我想使用 onetwo 列的键,它们是相似的,如果 three 列不完全是 nan,则估算现有的来自一行相似键的值,其值在列3"中.

I wanted to using the keys of column one and two which is similar and if column three is not entirely nan then impute the existing value from a row of similar keys with value in column '3'.

这是我想要的结果:

one | two | three
1      1     10
1      1     10
1      1     10
1      2     20
1      2     20
1      2     20
1      3     nan
1      3     nan

您可以看到键 1 和 3 不包含任何值,因为现有值不存在.

You can see that keys 1 and 3 do not contain any value because the existing value does not exists.

我尝试过使用 groupby+fillna():

df['three'] = df.groupby(['one','two'])['three'].fillna()

这给了我一个错误.

我尝试过向前填充,这给了我相当奇怪的结果,它向前填充第 2 列.我正在使用此代码进行前向填充.

I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.

df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill()

推荐答案

如果每组只有一个非 NaN 值使用 ffill(向前填充)和 bfill(向后填充)) 每组,所以需要 applylambda:

If only one non NaN value per group use ffill (forward filling) and bfill (backward filling) per group, so need apply with lambda:

df['three'] = df.groupby(['one','two'], sort=False)['three']
                .apply(lambda x: x.ffill().bfill())
print (df)
   one  two  three
0    1    1   10.0
1    1    1   10.0
2    1    1   10.0
3    1    2   20.0
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN

但是如果每组有多个值并且需要用一些常量替换 NaN - 例如mean 按组:

But if multiple value per group and need replace NaN by some constant - e.g. mean by group:

print (df)
   one  two  three
0    1    1   10.0
1    1    1   40.0
2    1    1    NaN
3    1    2    NaN
4    1    2   20.0
5    1    2    NaN
6    1    3    NaN
7    1    3    NaN

df['three'] = df.groupby(['one','two'], sort=False)['three']
                .apply(lambda x: x.fillna(x.mean()))
print (df)
   one  two  three
0    1    1   10.0
1    1    1   40.0
2    1    1   25.0
3    1    2   20.0
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN

这篇关于使用 groupby 填充 Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 14:03