问题描述
我列出了由PERMNO区分的10只股票.我想将这些股票按PERMNO分组,并计算每个PERMNO的股票收益(RET)与市场收益(vwretd)之间的滚动相关性.我正在尝试的代码如下.
I have a list of 10 stocks differentiated by PERMNO. I would like to group those stocks by PERMNO and calculate the rolling correlation between the stock return (RET) for each PERMNO with the market return (vwretd). The code I am trying is below.
CRSP['rollingcorr'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['RET'],CRSP['vwretd'],10)
我得到的错误如下.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-32-c18e1ce01302> in <module>()
1 #CRSP['rollingcorr'] = CRSP.rolling_corr(CRSP['vwretd'],CRSP['RET'],120)
----> 2 CRSP['rollingmean'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['vwretd'],10)
3 CRSP.head(20)
C:\Users\rebortz\Anaconda\lib\site-packages\pandas\core\groupby.pyc in __getattr__(self, attr)
296
297 raise AttributeError("%r object has no attribute %r" %
--> 298 (type(self).__name__, attr))
299
300 def __getitem__(self, key):
AttributeError: 'DataFrameGroupBy' object has no attribute 'rolling_corr'
请帮助!
谢谢
推荐答案
使用pandas.rolling_corr
,而不是DataFrame.rolling_corr
.此外,groupby
返回一个生成器.参见下面的代码.
Use pandas.rolling_corr
, not DataFrame.rolling_corr
. Besides, groupby
returns a generator. See below code.
代码:
import pandas as pd
df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")
for key, value in df_gen:
print "key: {}".format(key)
print value.rolling_corr(value["Value1"],value["Value2"], 3)
输出:
key: Blue
1 NaN
3 NaN
6 0.931673
8 0.865066
10 0.089304
12 -0.998656
15 -0.971373
17 -0.667316
dtype: float64
key: Red
0 NaN
2 NaN
5 -0.911357
9 -0.152221
11 -0.971153
14 0.438697
18 -0.550727
dtype: float64
key: Yellow
4 NaN
7 NaN
13 -0.040330
16 0.879371
dtype: float64
您可以将循环部分更改为以下内容,以查看原始数据帧的后分组以及新的列.
You can change the loop part to the following to view the original dataframe post-grouping with a new column as well.
for key, value in df_gen:
value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
print value
输出:
Color Value1 Value2 ROLL_CORR
1 Blue 0.951227 0.514999 NaN
3 Blue 0.649112 0.513052 NaN
6 Blue 0.148165 0.342205 0.931673
8 Blue 0.626883 0.421530 0.865066
10 Blue 0.286738 0.583811 0.089304
12 Blue 0.966779 0.227340 -0.998656
15 Blue 0.065493 0.887640 -0.971373
17 Blue 0.757932 0.900103 -0.667316
key: Red
Color Value1 Value2 ROLL_CORR
0 Red 0.201435 0.981871 NaN
2 Red 0.522955 0.357239 NaN
5 Red 0.806326 0.310039 -0.911357
9 Red 0.656126 0.678047 -0.152221
11 Red 0.435898 0.908388 -0.971153
14 Red 0.116419 0.555821 0.438697
18 Red 0.793102 0.168033 -0.550727
key: Yellow
Color Value1 Value2 ROLL_CORR
4 Yellow 0.099474 0.143293 NaN
7 Yellow 0.073128 0.749297 NaN
13 Yellow 0.006777 0.318383 -0.040330
16 Yellow 0.345647 0.993382 0.879371
如果要在处理后将它们全部结合在一起(顺便说一句,这可能会使其他人困惑),只需在处理组后使用concat
即可.
If you want to join them all together after processing (this might be confusing to others, by the way), just use concat
after processing groups.
import pandas as pd
df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")
dfs = [] # Container for dataframes.
for key, value in df_gen:
value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
print value
dfs.append(value)
df_final = pd.concat(dfs)
print df_final
输出:
Color Value1 Value2 ROLL_CORR
1 Blue 0.951227 0.514999 NaN
3 Blue 0.649112 0.513052 NaN
6 Blue 0.148165 0.342205 0.931673
8 Blue 0.626883 0.421530 0.865066
10 Blue 0.286738 0.583811 0.089304
12 Blue 0.966779 0.227340 -0.998656
15 Blue 0.065493 0.887640 -0.971373
17 Blue 0.757932 0.900103 -0.667316
0 Red 0.201435 0.981871 NaN
2 Red 0.522955 0.357239 NaN
5 Red 0.806326 0.310039 -0.911357
9 Red 0.656126 0.678047 -0.152221
11 Red 0.435898 0.908388 -0.971153
14 Red 0.116419 0.555821 0.438697
18 Red 0.793102 0.168033 -0.550727
4 Yellow 0.099474 0.143293 NaN
7 Yellow 0.073128 0.749297 NaN
13 Yellow 0.006777 0.318383 -0.040330
16 Yellow 0.345647 0.993382 0.879371
希望这会有所帮助.
这篇关于计算与 pandas 的滚动相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!