计算与 pandas 的滚动相关性

本文介绍了计算与 pandas 的滚动相关性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我列出了由PERMNO区分的10只股票.我想将这些股票按PERMNO分组，并计算每个PERMNO的股票收益(RET)与市场收益(vwretd)之间的滚动相关性.我正在尝试的代码如下.

I have a list of 10 stocks differentiated by PERMNO. I would like to group those stocks by PERMNO and calculate the rolling correlation between the stock return (RET) for each PERMNO with the market return (vwretd). The code I am trying is below.

CRSP['rollingcorr'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['RET'],CRSP['vwretd'],10)

我得到的错误如下.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-32-c18e1ce01302> in <module>()
      1 #CRSP['rollingcorr'] = CRSP.rolling_corr(CRSP['vwretd'],CRSP['RET'],120)
----> 2 CRSP['rollingmean'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['vwretd'],10)
      3 CRSP.head(20)

C:\Users\rebortz\Anaconda\lib\site-packages\pandas\core\groupby.pyc in __getattr__(self, attr)
    296
    297         raise AttributeError("%r object has no attribute %r" %
--> 298                              (type(self).__name__, attr))
    299
    300     def __getitem__(self, key):

AttributeError: 'DataFrameGroupBy' object has no attribute 'rolling_corr'

请帮助！

谢谢

推荐答案

使用pandas.rolling_corr，而不是DataFrame.rolling_corr.此外，groupby返回一个生成器.参见下面的代码.

Use pandas.rolling_corr, not DataFrame.rolling_corr. Besides, groupby returns a generator. See below code.

代码:

import pandas as pd

df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")

for key, value in df_gen:
    print "key: {}".format(key)
    print value.rolling_corr(value["Value1"],value["Value2"], 3)

输出:

key: Blue
1          NaN
3          NaN
6     0.931673
8     0.865066
10    0.089304
12   -0.998656
15   -0.971373
17   -0.667316
dtype: float64
key: Red
0          NaN
2          NaN
5    -0.911357
9    -0.152221
11   -0.971153
14    0.438697
18   -0.550727
dtype: float64
key: Yellow
4          NaN
7          NaN
13   -0.040330
16    0.879371
dtype: float64

您可以将循环部分更改为以下内容，以查看原始数据帧的后分组以及新的列.

You can change the loop part to the following to view the original dataframe post-grouping with a new column as well.

for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value

输出:

   Color    Value1    Value2  ROLL_CORR
1   Blue  0.951227  0.514999        NaN
3   Blue  0.649112  0.513052        NaN
6   Blue  0.148165  0.342205   0.931673
8   Blue  0.626883  0.421530   0.865066
10  Blue  0.286738  0.583811   0.089304
12  Blue  0.966779  0.227340  -0.998656
15  Blue  0.065493  0.887640  -0.971373
17  Blue  0.757932  0.900103  -0.667316
key: Red
   Color    Value1    Value2  ROLL_CORR
0    Red  0.201435  0.981871        NaN
2    Red  0.522955  0.357239        NaN
5    Red  0.806326  0.310039  -0.911357
9    Red  0.656126  0.678047  -0.152221
11   Red  0.435898  0.908388  -0.971153
14   Red  0.116419  0.555821   0.438697
18   Red  0.793102  0.168033  -0.550727
key: Yellow
     Color    Value1    Value2  ROLL_CORR
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371

如果要在处理后将它们全部结合在一起(顺便说一句，这可能会使其他人困惑)，只需在处理组后使用concat即可.

If you want to join them all together after processing (this might be confusing to others, by the way), just use concat after processing groups.

import pandas as pd

df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")

dfs = [] # Container for dataframes.

for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value
    dfs.append(value)

df_final = pd.concat(dfs)
print df_final

输出:

     Color    Value1    Value2  ROLL_CORR
1     Blue  0.951227  0.514999        NaN
3     Blue  0.649112  0.513052        NaN
6     Blue  0.148165  0.342205   0.931673
8     Blue  0.626883  0.421530   0.865066
10    Blue  0.286738  0.583811   0.089304
12    Blue  0.966779  0.227340  -0.998656
15    Blue  0.065493  0.887640  -0.971373
17    Blue  0.757932  0.900103  -0.667316
0      Red  0.201435  0.981871        NaN
2      Red  0.522955  0.357239        NaN
5      Red  0.806326  0.310039  -0.911357
9      Red  0.656126  0.678047  -0.152221
11     Red  0.435898  0.908388  -0.971153
14     Red  0.116419  0.555821   0.438697
18     Red  0.793102  0.168033  -0.550727
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371

希望这会有所帮助.

这篇关于计算与 pandas 的滚动相关性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！