本文介绍了在GridSearchCV中使用管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这个Pipeline对象:

from sklearn.pipeline import Pipeline
pipe = Pipeline([
    ('my_transform', my_transform()),
    ('estimator', SVC())
])

要将超参数传递给我的支持向量分类器(SVC),我可以执行如下操作:

pipe_parameters = {
    'estimator__gamma': (0.1, 1),
    'estimator__kernel': (rbf)
}

然后,我可以使用GridSearchCV

from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(pipe, pipe_parameters)
grid.fit(X_train, y_train)

我们知道线性内核不使用Gamma作为超参数。那么,我如何在此GridSearch中包括线性内核?

例如,在一个简单的GridSearch(没有管道)中,我可以:

param_grid = [
    {'C': [ 0.1, 1, 10, 100, 1000],
     'gamma': [0.0001, 0.001, 0.01, 0.1, 1],
     'kernel': ['rbf']},
    {'C': [0.1, 1, 10, 100, 1000],
     'kernel': ['linear']},
    {'C': [0.1, 1, 10, 100, 1000],
     'gamma': [0.0001, 0.001, 0.01, 0.1, 1],
     'degree': [2, 3],
     'kernel': ['poly']}
]
grid = GridSearchCV(SVC(), param_grid)

因此,我需要此类代码的工作版本:

pipe_parameters = {
    'bag_of_words__max_features': (None, 1500),
    'estimator__kernel': (rbf),
    'estimator__gamma': (0.1, 1),
    'estimator__kernel': (linear),
    'estimator__C': (0.1, 1),
}

表示我要将以下组合用作超参数:

kernel = rbf, gamma = 0.1
kernel = rbf, gamma = 1
kernel = linear, C = 0.1
kernel = linear, C = 1

推荐答案

您就快成功了。与为SVC模型创建多个词典类似,为管道创建词典列表。

试试这个例子:

from sklearn.datasets import fetch_20newsgroups
from sklearn.pipeline import pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import SVC

categories = [
    'alt.atheism',
    'talk.religion.misc',
    'comp.graphics',
    'sci.space',
]
remove = ('headers', 'footers', 'quotes')

data_train = fetch_20newsgroups(subset='train', categories=categories,
                                shuffle=True, random_state=42,
                                remove=remove)

pipe = Pipeline([
    ('bag_of_words', CountVectorizer()),
    ('estimator', SVC())])
pipe_parameters = [
    {'bag_of_words__max_features': (None, 1500),
     'estimator__C': [ 0.1, ],
     'estimator__gamma': [0.0001, 1],
     'estimator__kernel': ['rbf']},
    {'bag_of_words__max_features': (None, 1500),
     'estimator__C': [0.1, 1],
     'estimator__kernel': ['linear']}
]
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(pipe, pipe_parameters, cv=2)
grid.fit(data_train.data, data_train.target)

grid.best_params_
# {'bag_of_words__max_features': None,
#  'estimator__C': 0.1,
#  'estimator__kernel': 'linear'}

这篇关于在GridSearchCV中使用管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-30 19:27