问题描述
作为此问题,我想从像这样的熊猫数据框中计算CAGR,其中缺少一些数据值:
As a follow-up to this question,I'd like to calculate the CAGR from a pandas data frame such as this, where there are some missing data values:
df = pd.DataFrame({'A' : ['1','2','3','7'],
'B' : [7,6,np.nan,4],
'C' : [5,6,7,1],
'D' : [np.nan,9,9,8]})
df=df.set_index('A')
df
B C D
A
1 7 5 NaN
2 6 6 9
3 NaN 7 9
7 4 1 8
提前谢谢!
推荐答案
在计算某个级别的回报时,可以使用最新的级别.例如,在计算第1行的CAGR时,我们要使用(5/7)^(1/3)-1.此外,对于第3行(9/7)^(1/3).有一个假设是,我们对所考察的所有年份都进行了年化处理.
When calculating returns from a level, it's ok to use most recent available. For example, when calculating CAGR for row 1, we want to use (5/7) ^ (1/3) - 1. Also, for row 3 (9/7) ^ (1/3). There is an assumption made that we annualize across all years looked at.
基于这些假设:
df = df.bfill(axis=1).ffill(axis=1)
然后应用链接问题中的解决方案.
Then apply solution from linked question.
df['CAGR'] = df.T.pct_change().add(1).prod().pow(1./(len(df.columns) - 1)).sub(1)
没有这个假设.唯一的其他合理选择是按非NaN观测值的数量进行年度化.因此,我需要使用以下方法进行跟踪:
With out this assumption. The only other reasonable choice would be to annualize by the number of non-NaN observations. So I need to track that with:
notnull = df.notnull().sum(axis=1)
df = df.bfill(axis=1).ffill(axis=1)
df['CAGR'] = df.T.pct_change().add(1).prod().pow(1./(notnull.sub(1))).sub(1)
实际上,这将成为更通用的解决方案,因为它也适用于没有null的情况.
In fact, this becomes the more general solution as it will work with the case with out nulls as well.
这篇关于 pandas 通过切片计算CAGR(缺失值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!