问题描述
使用 sklearn 的 PLS 回归给出的预测结果非常差.当我得到模型时,我找不到找到拦截"的方法.也许这会影响模型的预测?分数和负载的矩阵很好.系数的排列也是如此.无论如何,我如何使用已经获得的属性来获取拦截?
The PLS regression using sklearn gives very poor prediction results. When I get the model I can not find the way to find the "intercept". Perhaps this affects the prediction of the model? The matrix of scores and loadings are fine. The arrangement of the coefficients also. In any case, how do I get the intercept using the attributes already obtained?
此代码抛出变量的系数.
This code throws the coefficients of the variables.
from pandas import DataFrame
from sklearn.cross_decomposition import PLSRegression
X = DataFrame( {
'x1': [0.0,1.0,2.0,2.0],
'x2': [0.0,0.0,2.0,5.0],
'x3': [1.0,0.0,2.0,4.0],
}, columns = ['x1', 'x2', 'x3'] )
Y = DataFrame({
'y': [ -0.2, 1.1, 5.9, 12.3 ],
}, columns = ['y'] )
def regPLS1(X,Y):
_COMPS_ = len(X.columns) # all latent variables
model = PLSRegression(_COMPS_).fit( X, Y )
return model.coef_
结果是:
regPLS1(X,Y)
>>> array([[ 0.84], [ 2.44], [-0.46]])
除了这些系数,截距的值为:0.26.我做错了什么?
In addition to these coefficients, the value of the intercept is: 0.26. What am I doing wrong?
编辑正确的预测(评估)响应是 Y_hat(与观察到的 Y 完全相同):
EDITThe correct predict(evaluate) response is Y_hat (exactly the same the observed Y):
Y_hat = [-0.2 1.1 5.9 12.3]
推荐答案
要计算截距,请使用以下内容:
To calculate the intercept use the following:
plsModel = PLSRegression(_COMPS_).fit( X, Y )
y_intercept = plsModel.y_mean_ - numpy.dot(plsModel.x_mean_ , plsModel.coef_)
我直接从 R "pls" 包中得到了公式:
I got the formula directly from the R "pls" package:
BInt[1,,i] <- object$Ymeans - object$Xmeans %*% B[,,i]
我在 R 'pls' 和 scikit-learn 中测试了结果并计算了相同的截距.
I tested the results and calculated the same intercepts in R 'pls' and scikit-learn.
这篇关于如何获得 PLS-Regression (sklearn) 的截距的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!