本文介绍了Python的:加权平均算法大 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧,看起来像这样:

I have a dataframe that looks like this:

Out[14]:
    impwealth  indweight
16     180000     34.200
21     384000     37.800
26     342000     39.715
30    1154000     44.375
31     421300     44.375
32    1210000     45.295
33    1062500     45.295
34    1878000     46.653
35     876000     46.653
36     925000     53.476

我要计算使用频率的权重列 impwealth indweight 的加权中值。我的伪code是这样的:

I want to calculate the weighted median of the column impwealth using the frequency weights in indweight. My pseudo code looks like this:

# Sort `impwealth` in ascending order 
df.sort('impwealth', 'inplace'=True)

# Find the 50th percentile weight, P
P = df['indweight'].sum() * (.5)

# Search for the first occurrence of `impweight` that is greater than P 
i = df.loc[df['indweight'] > P, 'indweight'].last_valid_index()

# The value of `impwealth` associated with this index will be the weighted median
w_median = df.ix[i, 'impwealth']

这个方法似乎笨重,而且我不知道这是正确的。我没有找到一个内置的方式做到这一点的大熊猫参考。什么是去寻找加权中值的最佳方式是什么?

This method seems clunky, and I'm not sure it's correct. I didn't find a built in way to do this in pandas reference. What is the best way to go about finding weighted median?

推荐答案

您是否尝试过 wqantiles 包?我从来没有使用过它,但它有一个加权中值函数,似乎给至少一个合理的答案(你可能想仔细检查,它的使用你所期望的方式)。

Have you tried the wqantiles package? I had never used it before, but it has a weighted median function that seems to give at least a reasonable answer (you'll probably want to double check that it's using the approach you expect).

In [12]: import weighted

In [13]: weighted.median(df['impwealth'], df['indweight'])
Out[13]: 914662.0859091772

这篇关于Python的:加权平均算法大 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 22:38