我已经编写了一些代码,使用pandas中的pivot表计算加权平均值。但是,我不知道如何添加执行加权平均的实际列(添加一个新列,其中每一行包含“cumulative”/“count”值)。
数据看起来是这样的:
VALUE COUNT GRID agb
1 43 1476 1051
2 212 1476 2983
5 7 1477 890
4 1361 1477 2310
这是我的代码:
# Read input data
lup_df = pandas.DataFrame.from_csv(o_dir+LUP+'.csv',index_col=False)
# Insert a new column with area * variable
lup_df['cumulative'] = lup_df['COUNT']*lup_df['agb']
# Create and output pivot table
lup_pvt = pandas.pivot_table(lup_df, 'agb', rows=['GRID'])
# TODO: Add a new column where each row contains value of 'cumulative'/'COUNT'
lup_pvt.to_csv(o_dir+PIVOT+'.csv',index=True,header=True,sep=',')
我该怎么做?
最佳答案
因此,对于每一个值grid
,权重的值是agb
列中的值的count
列的加权平均值。如果这种解释是正确的,我想这就是“抄袭”的诀窍:
import numpy as np
import pandas as pd
np.random.seed(0)
n = 50
df = pd.DataFrame({'count': np.random.choice(np.arange(10)+1, n),
'grid': np.random.choice(np.arange(10)+50, n),
'value': np.random.randn(n) + 12})
df['prod'] = df['count'] * df['value']
grouped = df.groupby('grid').sum()
grouped['wtdavg'] = grouped['prod'] / grouped['count']
print grouped
count value prod wtdavg
grid
50 22 57.177042 243.814417 11.082474
51 27 58.801386 318.644085 11.801633
52 11 34.202619 135.127942 12.284358
53 24 59.340084 272.836636 11.368193
54 39 137.268317 482.954857 12.383458
55 47 79.468986 531.122652 11.300482
56 17 38.624369 214.188938 12.599349
57 22 38.572429 279.948202 12.724918
58 27 36.492929 327.315518 12.122797
59 34 60.851671 408.306429 12.009013
或者,如果你想有点圆滑,写一个加权平均值函数,你可以反复使用:
import numpy as np
import pandas as pd
np.random.seed(0)
n = 50
df = pd.DataFrame({'count': np.random.choice(np.arange(10)+1, n),
'grid': np.random.choice(np.arange(10)+50, n),
'value': np.random.randn(n) + 12})
def wavg(val_col_name, wt_col_name):
def inner(group):
return (group[val_col_name] * group[wt_col_name]).sum() / group[wt_col_name].sum()
inner.__name__ = 'wtd_avg'
return inner
slick = df.groupby('grid').apply(wavg('value', 'count'))
print slick
grid
50 11.082474
51 11.801633
52 12.284358
53 11.368193
54 12.383458
55 11.300482
56 12.599349
57 12.724918
58 12.122797
59 12.009013
dtype: float64