本文介绍了 pandas 滚动窗口百分位排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试在滚动窗口内按列计算数据的百分位排名.
I am trying to calculate the percentile rank of data by column within a rolling window.
test=pd.DataFrame(np.random.randn(20,3),pd.date_range('1/1/2000',periods=20),['A','B','C'])
test
Out[111]:
A B C
2000-01-01 -0.566992 -1.494799 0.462330
2000-01-02 -0.550769 -0.699104 0.767778
2000-01-03 -0.270597 0.060836 0.057195
2000-01-04 -0.583784 -0.546418 -0.557850
2000-01-05 0.294073 -2.326211 0.262098
2000-01-06 -1.122543 -0.116279 -0.003088
2000-01-07 0.121387 0.763100 3.503757
2000-01-08 0.335564 0.076304 2.021757
2000-01-09 0.403170 0.108256 0.680739
2000-01-10 -0.254558 -0.497909 -0.454181
2000-01-11 0.167347 0.459264 -1.247459
2000-01-12 -1.243778 0.858444 0.338056
2000-01-13 -1.070655 0.924808 0.080867
2000-01-14 -1.175651 -0.559712 -0.372584
2000-01-15 -0.216708 -0.116188 0.511223
2000-01-16 0.597171 0.205529 -0.728783
2000-01-17 -0.624469 0.592436 0.832100
2000-01-18 0.259269 0.665585 0.126534
2000-01-19 1.150804 0.575759 -1.335835
2000-01-20 -0.909525 0.500366 2.120933
我尝试将 .rolling 与 .apply 一起使用,但我遗漏了一些东西.
I tried to use .rolling with .apply but I am missing something.
pctrank = lambda x: x.rank(pct=True)
rollingrank=test.rolling(window=10,centre=False).apply(pctrank)
对于 A 列,最终值将是从 2000-01-11 到 2000-01-20 的 length=10 窗口内的百分位等级 -0.909525.有什么想法吗?
For column A the final value would be the percentile rank of -0.909525 within the length=10 window from 2000-01-11 to 2000-01-20. Any ideas?
推荐答案
你的 lambda 接收一个 numpy 数组,它没有 .rank
方法 —pandas 的 Series
和 DataFrame
拥有它.您可以因此将其更改为
Your lambda receives a numpy array, which does not have a .rank
method — it is pandas's Series
and DataFrame
that have it. You can thus change it to
pctrank = lambda x: pd.Series(x).rank(pct=True).iloc[-1]
或者您可以按照 this SO answer:
def pctrank(x):
n = len(x)
temp = x.argsort()
ranks = np.empty(n)
ranks[temp] = (np.arange(n) + 1) / n
return ranks[-1]
这篇关于 pandas 滚动窗口百分位排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!