问题描述
我有一个dataFrame,其中值"列缺少值.我想通过每个名称"组中的加权平均值来填充缺失值.关于如何用每组中的简单平均值而不是加权平均值来填充缺失值的文章.非常感谢!
I have a dataFrame where 'value'column has missing values. I'd like to filling missing values by weighted average within each 'name' group. There was post on how to fill the missing values by simple average in each group but not weighted average. Thanks a lot!
df = pd.DataFrame({'value': [1, np.nan, 3, 2, 3, 1, 3, np.nan, np.nan],'weight':[3,1,1,2,1,2,2,1,1], 'name': ['A','A', 'A','B','B','B', 'C','C','C']})
name value weight
0 A 1.0 3
1 A NaN 1
2 A 3.0 1
3 B 2.0 2
4 B 3.0 1
5 B 1.0 2
6 C 3.0 2
7 C NaN 1
8 C NaN 1
我想在每个名称"组中用加权值填写"NaN",即
I'd like to fill in "NaN" with weighted value in each "name" group, i.e.
name value weight
0 A 1.0 3
1 A 1.5 1
2 A 3.0 1
3 B 2.0 2
4 B 3.0 1
5 B 1.0 2
6 C 3.0 2
7 C 3.0 1
8 C 3.0 1
推荐答案
您可以按name
对数据帧进行分组,并使用fillna
方法用加权平均值填充缺失的值,可以使用np.average
用weights
参数:
You can group data frame by name
, and use fillna
method to fill the missing values with weighted average which can calculated with np.average
with weights
parameter:
df['value'] = (df.groupby('name', group_keys=False)
.apply(lambda g: g.value.fillna(np.average(g.dropna().value, weights=g.dropna().weight))))
df
#name value weight
#0 A 1.0 3
#1 A 1.5 1
#2 A 3.0 1
#3 B 2.0 2
#4 B 3.0 1
#5 B 1.0 2
#6 C 3.0 2
#7 C 3.0 1
#8 C 3.0 1
要减少混淆,请定义一个 fillValue 函数:
To make this less convoluted, define a fillValue function:
import numpy as np
import pandas as pd
def fillValue(g):
gNotNull = g.dropna()
wtAvg = np.average(gNotNull.value, weights=gNotNull.weight)
return g.value.fillna(wtAvg)
df['value'] = df.groupby('name', group_keys=False).apply(fillValue)
这篇关于 pandas :通过加权平均值填充每组中的缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!