结合使用statsmodel估计和scikit

结合使用statsmodel估计和scikit

本文介绍了结合使用statsmodel估计和scikit-learn交叉验证,是否可能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将此问题发布到了Cross Validated论坛,后来意识到可能是可以在stackoverlfow中找到合适的受众。

I posted this question to Cross Validated forum and later realized may be this would find appropriate audience in stackoverlfow instead.

我正在寻找一种可以使用从python statsmodel获取的 fit 对象(结果),以馈入scikit-learn cross_validation方法的 cross_val_score 吗?
附加链接表明可能,但我没有成功。

I am looking for a way I can use the fit object (result) ontained from python statsmodel to feed into cross_val_score of scikit-learn cross_validation method?The attached link suggests that it may be possible but I have not succeeded.

我遇到以下错误

推荐答案

实际上,您不能直接在 statsmodels 上使用 cross_val_score 对象,因为接口不同:在statsmodels中

Indeed, you cannot use cross_val_score directly on statsmodels objects, because of different interface: in statsmodels


  • 训练数据直接传递到构造函数中

  • 一个单独的对象包含模型估计的结果

但是,您可以编写一个简单的包装器来制作 statsmodels 对象看起来像 sklearn 估算器:

However, you can write a simple wrapper to make statsmodels objects look like sklearn estimators:

import statsmodels.api as sm
from sklearn.base import BaseEstimator, RegressorMixin

class SMWrapper(BaseEstimator, RegressorMixin):
    """ A universal sklearn-style wrapper for statsmodels regressors """
    def __init__(self, model_class, fit_intercept=True):
        self.model_class = model_class
        self.fit_intercept = fit_intercept
    def fit(self, X, y):
        if self.fit_intercept:
            X = sm.add_constant(X)
        self.model_ = self.model_class(y, X)
        self.results_ = self.model_.fit()
    def predict(self, X):
        if self.fit_intercept:
            X = sm.add_constant(X)
        return self.results_.predict(X)

此类包含正确的 fit 预测方法,并且可以与 sklearn 一起使用,例如交叉验证或包含在管道中。像这里:

This class contains correct fit and predict methods, and can be used with sklearn, e.g. cross-validated or included into a pipeline. Like here:

from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression

X, y = make_regression(random_state=1, n_samples=300, noise=100)

print(cross_val_score(SMWrapper(sm.OLS), X, y, scoring='r2'))
print(cross_val_score(LinearRegression(), X, y, scoring='r2'))

您可以看到两个模型的输出相同,因为它们都是OLS模型,并且以相同的方式进行交叉验证。

You can see that the output of two models is identical, because they are both OLS models, cross-validated in the same way.

[0.28592315 0.37367557 0.47972639]
[0.28592315 0.37367557 0.47972639]

这篇关于结合使用statsmodel估计和scikit-learn交叉验证,是否可能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 18:54