本文介绍了(Python - sklearn) 如何通过 gridsearchcv 将参数传递给自定义的 ModelTransformer 类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我的管道,似乎我无法使用 ModelTransformer 类将参数传递给我的模型,我从链接 (http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)

Below is my pipeline and it seems that I can't pass the parameters to my models by using the ModelTransformer class, which I take it from the link (http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)

错误消息对我来说很有意义,但我不知道如何解决这个问题.知道如何解决这个问题吗?谢谢.

The error message makes sense to me, but I don't know how to fix this. Any idea how to fix this? Thanks.

# define a pipeline
pipeline = Pipeline([
('vect', DictVectorizer(sparse=False)),
('scale', preprocessing.MinMaxScaler()),
('ess', FeatureUnion(n_jobs=-1,
                     transformer_list=[
     ('rfc', ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1,  n_estimators=100))),
     ('svc', ModelTransformer(SVC(random_state=1))),],
                     transformer_weights=None)),
('es', EnsembleClassifier1()),
])

# define the parameters for the pipeline
parameters = {
'ess__rfc__n_estimators': (100, 200),
}

# ModelTransformer class. It takes it from the link
(http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)
class ModelTransformer(TransformerMixin):
    def __init__(self, model):
        self.model = model
    def fit(self, *args, **kwargs):
        self.model.fit(*args, **kwargs)
        return self
    def transform(self, X, **transform_params):
        return DataFrame(self.model.predict(X))

grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, refit=True)

错误信息:ValueError: 估计器 ModelTransformer 的参数 n_estimators 无效.

Error Message:ValueError: Invalid parameter n_estimators for estimator ModelTransformer.

推荐答案

GridSearchCV 对嵌套对象有特殊的命名约定.在您的情况下,ess__rfc__n_estimators 代表 ess.rfc.n_estimators,并且根据 pipeline 的定义,它指向属性 n_estimators of

GridSearchCV has a special naming convention for nested objects. In your case ess__rfc__n_estimators stands for ess.rfc.n_estimators, and, according to the definition of the pipeline, it points to the property n_estimators of

ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1,  n_estimators=100)))

显然,ModelTransformer 实例没有这样的属性.

Obviously, ModelTransformer instances don't have such property.

修复很简单:为了访问ModelTransformer 的底层对象,需要使用model 字段.于是,网格参数变为

The fix is easy: in order to access underlying object of ModelTransformer one needs to use model field. So, grid parameters become

parameters = {
  'ess__rfc__model__n_estimators': (100, 200),
}

P.S. 这不是您的代码的唯一问题.为了在 GridSearchCV 中使用多个作业,您需要使您使用的所有对象都可复制.这是通过实现方法 get_paramsset_params 来实现的,你可以从 BaseEstimator 混合.

P.S. it's not the only problem with your code. In order to use multiple jobs in GridSearchCV, you need to make all objects you're using copy-able. This is achieved by implementing methods get_params and set_params, you can borrow them from BaseEstimator mixin.

这篇关于(Python - sklearn) 如何通过 gridsearchcv 将参数传递给自定义的 ModelTransformer 类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-22 00:58