我有一个这样的数据框架:
Out[14]:
impwealth indweight
16 180000 34.200
21 384000 37.800
26 342000 39.715
30 1154000 44.375
31 421300 44.375
32 1210000 45.295
33 1062500 45.295
34 1878000 46.653
35 876000 46.653
36 925000 53.476
我想使用
impwealth
中的频率权重计算列的加权中位数。我的伪代码如下:# Sort `impwealth` in ascending order
df.sort('impwealth', 'inplace'=True)
# Find the 50th percentile weight, P
P = df['indweight'].sum() * (.5)
# Search for the first occurrence of `impweight` that is greater than P
i = df.loc[df['indweight'] > P, 'indweight'].last_valid_index()
# The value of `impwealth` associated with this index will be the weighted median
w_median = df.ix[i, 'impwealth']
这个方法看起来很笨拙,我不确定它是否正确。在《熊猫参考》中,我没有找到一种内置的方法来实现这一点。找到加权中值的最佳方法是什么?
最佳答案
如果你想在纯熊猫身上这样做,这里有一个方法。它也不插入。(@svenkatesh,您的伪代码中缺少累积和)
df.sort_values('impwealth', inplace=True)
cumsum = df.indweight.cumsum()
cutoff = df.indweight.sum() / 2.0
median = df.impwealth[cumsum >= cutoff].iloc[0]
中位数为925000。
关于python - Python:带有pandas的加权中值算法,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26102867/