本文介绍了统计模型:计算拟合值和R平方的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在按如下方式进行回归(dfpandas数据帧):

I am running a regression as follows (df is a pandas dataframe):

import statsmodels.api as sm
est = sm.OLS(df['p'], df[['e', 'varA', 'meanM', 'varM', 'covAM']]).fit()
est.summary()

除其他外,这给了我0.942的R平方.因此,我想绘制原始的y-values和拟合值.为此,我对原始值进行了排序:

Which gave me, among others, an R-squared of 0.942. So then I wanted to plot the original y-values and the fitted values. For this, I sorted the original values:

orig = df['p'].values
fitted = est.fittedvalues.values
args = np.argsort(orig)
import matplotlib.pyplot as plt
plt.plot(orig[args], 'bo')
plt.plot(orig[args]-resid[args], 'ro')
plt.show()

但是,这给了我一个图表,其中的值完全不可用.没有什么可以暗示0.9的R平方.因此,我尝试自己手动进行计算:

This, however, gave me a graph where the values were completely off. Nothing that would suggest an R-squared of 0.9. Therefore, I tried to calculate it manually myself:

yBar = df['p'].mean()
SSTot = df['p'].apply(lambda x: (x-yBar)**2).sum()
SSReg = ((est.fittedvalues - yBar)**2).sum()
1 - SSReg/SSTot
Out[79]: 0.2618159806908984

我做错什么了吗?还是有原因导致我的计算与statsmodels相差甚远? SSTotSSReg的值分别为4808435495.

Am I doing something wrong? Or is there a reason why my computation is so far off what statsmodels is getting? SSTot, SSReg have values of 48084, 35495.

推荐答案

如果模型中未包含截距(常量解释变量),则statsmodels将基于 un-centred 计算R平方总平方和,即.

If you do not include an intercept (constant explanatory variable) in your model, statsmodels computes R-squared based on un-centred total sum of squares, ie.

tss = (ys ** 2).sum()  # un-centred total sum of squares

相对于

tss = ((ys - ys.mean())**2).sum()  # centred total sum of squares

因此,R平方会更高.

从数学上来说这是正确的 .因为,R平方应指示与简化模型相比,完整模型可以解释多少变化.如果您将模型定义为:

This is mathematically correct. Because, R-squared should indicate how much of the variation is explained by the full-model comparing to the reduced model. If you define your model as:

ys = beta1 . xs + beta0 + noise

则简化后的模型可以是:ys = beta0 + noise,其中beta0的估计值是样本平均值,因此我们具有:noise = ys - ys.mean().这就是去义来自具有拦截功能的模型中的地方.

then the reduced model can be: ys = beta0 + noise, where the estimate for beta0 is the sample average, thus we have: noise = ys - ys.mean(). That is where de-meaning comes from in a model with intercept.

但是通过类似这样的模型:

But from a model like:

ys = beta . xs + noise

您只能减少为:ys = noise.由于noise被假定为零均值,因此您不可以将ys不均值.因此,简化模型中无法解释的变化是不居中的平方总和.

you may only reduce to: ys = noise. Since noise is assumed zero-mean, you may not de-mean ys. Therefore, unexplained variation in the reduced model is the un-centred total sum of squares.

此处rsquared项目下.将yBar设置为零,我希望您会得到相同的数字.

This is documented here under rsquared item. Set yBar equal to zero, and I would expect you will get the same number.

这篇关于统计模型:计算拟合值和R平方的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 21:54