我正在尝试运行多变量回归并得到错误:

“ ValueError:endog和exog矩阵的大小不同”

我的代码段如下:

df_raw = pd.DataFrame(data=df_raw)

y = (df_raw['daily pct return']).astype(float)
x1 = (df_raw['Excess daily return']).astype(float)
x2 = (df_raw['Excess weekly return']).astype(float)
x3 = (df_raw['Excess monthly return']).astype(float)
x4 = (df_raw['Trading vol / mkt cap']).astype(float)
x5 = (df_raw['Std dev']).astype(float)
x6 = (df_raw['Residual risk']).astype(float)

y = y.replace([np.inf, -np.inf],np.nan).dropna()

print(y.shape)
print(x1.shape)
print(x2.shape)
print(x3.shape)
print(x4.shape)
print(x5.shape)
print(x6.shape)


df_raw.to_csv('Raw_final.csv', header=True)

result = smf.OLS(exog=y, endog=[x1, x2, x3, x4, x5, x6]).fit()
print(result.params)
print(result.summary())


从代码中可以看到,我正在检查每个变量的“形状”。我得到以下输出,指示错误的原因是y变量只有48392个值,而所有其他变量都有48393个值:

(48392,)
(48393,)
(48393,)
(48393,)
(48393,)
(48393,)
(48393,)

我的数据框如下所示:

  daily pct return | Excess daily return | weekly pct return | index weekly pct return | Excess weekly return | monthly pct return | index monthly pct return | Excess monthly return | Trading vol / mkt cap |   Std dev
 ------------------|---------------------|-------------------|-------------------------|----------------------|--------------------|--------------------------|-----------------------|-----------------------|-------------
                   |                     |                   |                         |                      |                    |                          |                       |           0.207582827 |
       0.262658228 |         0.322397801 |                   |                         |                      |                    |                          |                       |           0.285585677 |
       0.072681704 |         0.126445534 |                   |                         |                      |                    |                          |                       |           0.272920624 |
       0.135514019 |         0.068778682 |                   |                         |                      |                    |                          |                       |           0.213149083 |
      -0.115226337 |        -0.173681889 |                   |                         |                      |                    |                          |                       |           0.155653699 |
      -0.165116279 |        -0.176569405 |                   |                         |                      |                    |                          |                       |           0.033925024 |
       0.125348189 |         0.079889239 |                   |                         |                      |                    |                          |                       |           0.030968484 | 0.544133212
       0.022277228 |        -0.044949678 |                   |                         |                      |                    |                          |                       |           0.020735381 | 0.385659608
       0.150121065 |         0.102119782 |                   |                         |                      |                    |                          |                       |           0.063563881 | 0.430868447
       0.336842105 |         0.333590483 |                   |                         |                      |                    |                          |                       |           0.210193049 | 0.893734807
       0.011023622 |        -0.011860658 |       0.320987654 |            -0.657089012 |          0.978076666 |                    |                          |                       |           0.100468109 | 1.137976483
        0.37694704 |         0.308505907 |                   |                         |                      |                    |                          |                       |           0.135828281 | 1.867394416


有谁有一个优雅的解决方案来对齐矩阵的大小,这样我就不再收到此错误?我想我需要从y变量(“每日pct返回”)中删除值APART的第一行,但是我不确定该如何实现?

提前致谢!!

最佳答案

我假设您想丢弃所有与您的y值无穷大相关的数据。

df_raw = pd.DataFrame(data=df_raw)

df_raw['daily pct return']) = df_raw['daily pct return']).astype(float).replace([np.inf, -np.inf],np.nan)
df_raw = df_raw.dropna()


然后根据需要进行回归。

07-24 09:51