问题描述
我想通过交叉验证检查新方法的预测误差.我想知道我是否可以将我的方法传递给 sklearn 的交叉验证函数,以及如何传递.
我想要sklearn.cross_validation(cv=10).mymethod
之类的东西.
我还需要知道如何定义 mymethod
如果它是一个函数以及哪个输入元素和哪个输出
例如,我们可以将 mymethod
视为最小二乘估计器的实现(当然不是 sklearn 中的那些).
我找到了这个教程 link 但我不太清楚.
在文档中,他们使用
>>>将 numpy 导入为 np>>>从 sklearn 导入 cross_validation>>>从 sklearn 导入数据集>>>从 sklearn 导入 svm>>>虹膜 = datasets.load_iris()>>>iris.data.shape, iris.target.shape((150, 4), (150,))>>>clf = svm.SVC(内核=线性",C=1)>>>分数 = cross_validation.cross_val_score(... clf, iris.data, iris.target, cv=5)...>>>分数但问题是他们使用的是由 sklearn 中内置的函数获得的估计器 clf
.我应该如何定义自己的估算器才能将其传递给 cross_validation.cross_val_score
函数?
例如,假设一个简单的估计器使用线性模型 $y=x\beta$,其中 beta 被估计为 X[1,:]+alpha,其中 alpha 是一个参数.我应该如何完成代码?
class my_estimator():定义适合(X,y):beta=X[1,:]+alpha #哪里可以将alpha传递给函数?返回测试版def scorer(estimator, X, y) #scorer 函数应该计算什么?返回 ?????
使用以下代码我收到一个错误:
class my_estimator():def fit(X, y, **kwargs):#alpha = kwargs['alpha']beta=X[1,:]#+alpha返回测试版
>>>cv=cross_validation.cross_val_score(my_estimator,x,y,scoring="mean_squared_error")回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py",第1152行,在cross_val_score对于火车,在 cv 中测试)文件C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\externals\joblib\parallel.py",第516行,在__call__对于可迭代的函数、args、kwargs:文件C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py",第1152行,在<genexpr>对于火车,在 cv 中测试)文件C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\base.py",第43行,克隆% (repr(estimator), type(estimator)))类型错误:无法克隆对象"(type <type 'classobj'>):它似乎不是 scikit-learn 估算器,因为它没有实现 'get_params' 方法.>>>
答案也在于 sklearn 的 文档.
您需要定义两件事:
一个实现
fit(X, y)
函数的估计器,X
是输入矩阵,y
是输出向量一个评分器函数,或可调用对象,可用于:
scorer(estimator, X, y)
并返回给定模型的分数
参考你的例子:首先,scorer
不应该是估算器的一种方法,它是一个不同的概念.只需创建一个可调用的:
def scorer(estimator, X, y)返回 ?????# 计算任何你想要的,这由你来定义# 给定的估计量是好"还是坏"是什么意思
或者更简单的解决方案:您可以传递一个字符串 'mean_squared_error'
或 'accuracy'
(完整列表可在 这部分文档) 到 cross_val_score
函数以使用预定义的评分器.
另一种可能是使用 make_scorer
工厂函数.
至于第二件事,您可以通过 fit_params dict
参数将参数传递给您的模型.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score" rel="noreferrer">cross_val_score
函数(如文档中所述).这些参数将传递给 fit
函数.
class my_estimator():def fit(X, y, **kwargs):alpha = kwargs['alpha']beta=X[1,:]+alpha返回测试版
在阅读了所有错误消息后,这些消息提供了对缺少什么的清晰概念,这里是一个简单的例子:
将 numpy 导入为 np从 sklearn.cross_validation 导入 cross_val_score类正则化回归器:def __init__(self, l = 0.01):self.l = ldef组合(自我,输入):return sum([i*w for (i,w) in zip([1] + inputs, self.weights)])定义预测(自我,X):返回 [self.combine(x) for x in X]定义分类(自我,输入):返回符号(self.predict(输入))def fit(self, X, y, **kwargs):self.l = kwargs['l']X = np.matrix(X)y = np.matrix(y)W = (X.transpose() * X).getI() * X.transpose() * yself.weights = [w[0] for w in W.tolist()]def get_params(self, deep = False):返回 {'l':self.l}X = np.matrix([[0, 0], [1, 0], [0, 1], [1, 1]])y = np.matrix([0, 1, 1, 0]).transpose()打印 cross_val_score(RegularizedRegressor(),X,是,fit_params={'l':0.1},评分 = 'mean_squared_error')
I would like to check the prediction error of a new method trough cross-validation.I would like to know if I can pass my method to the cross-validation function of sklearn and in case how.
I would like something like sklearn.cross_validation(cv=10).mymethod
.
I need also to know how to define mymethod
should it be a function and which input element and which output
For example we can consider as mymethod
an implementation of the least square estimator (of course not the ones in sklearn) .
I found this tutorial link but it is not very clear to me.
In the documentation they use
>>> import numpy as np
>>> from sklearn import cross_validation
>>> from sklearn import datasets
>>> from sklearn import svm
>>> iris = datasets.load_iris()
>>> iris.data.shape, iris.target.shape
((150, 4), (150,))
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_validation.cross_val_score(
... clf, iris.data, iris.target, cv=5)
...
>>> scores
But the problem is that they are using as estimator clf
that is obtained by a function built in sklearn. How should I define my own estimator in order that I can pass it to the cross_validation.cross_val_score
function?
So for example suppose a simple estimator that use a linear model $y=x\beta$ where beta is estimated as X[1,:]+alpha where alpha is a parameter. How should I complete the code?
class my_estimator():
def fit(X,y):
beta=X[1,:]+alpha #where can I pass alpha to the function?
return beta
def scorer(estimator, X, y) #what should the scorer function compute?
return ?????
With the following code I received an error:
class my_estimator():
def fit(X, y, **kwargs):
#alpha = kwargs['alpha']
beta=X[1,:]#+alpha
return beta
>>> cv=cross_validation.cross_val_score(my_estimator,x,y,scoring="mean_squared_error")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in cross_val_score
for train, test in cv)
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\externals\joblib\parallel.py", line 516, in __call__
for function, args, kwargs in iterable:
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in <genexpr>
for train, test in cv)
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\base.py", line 43, in clone
% (repr(estimator), type(estimator)))
TypeError: Cannot clone object '<class __main__.my_estimator at 0x05ACACA8>' (type <type 'classobj'>): it does not seem to be a scikit-learn estimator a it does not implement a 'get_params' methods.
>>>
The answer also lies in sklearn's documentation.
You need to define two things:
an estimator that implements the
fit(X, y)
function,X
being the matrix with inputs andy
being the vector of outputsa scorer function, or callable object that can be used with:
scorer(estimator, X, y)
and returns the score of given model
Referring to your example: first of all, scorer
shouldn't be a method of the estimator, it's a different notion. Just create a callable:
def scorer(estimator, X, y)
return ????? # compute whatever you want, it's up to you to define
# what does it mean that the given estimator is "good" or "bad"
Or even a more simple solution: you can pass a string 'mean_squared_error'
or 'accuracy'
(full list available in this part of the documentation) to cross_val_score
function to use a predefined scorer.
Another possibility is to use make_scorer
factory function.
As for the second thing, you can pass parameters to your model through the fit_params
dict
parameter of the cross_val_score
function (as mentioned in the documentation). These parameters will be passed to the fit
function.
class my_estimator():
def fit(X, y, **kwargs):
alpha = kwargs['alpha']
beta=X[1,:]+alpha
return beta
After reading all the error messages, which provide quite clear idea of what's missing, here is a simple example:
import numpy as np
from sklearn.cross_validation import cross_val_score
class RegularizedRegressor:
def __init__(self, l = 0.01):
self.l = l
def combine(self, inputs):
return sum([i*w for (i,w) in zip([1] + inputs, self.weights)])
def predict(self, X):
return [self.combine(x) for x in X]
def classify(self, inputs):
return sign(self.predict(inputs))
def fit(self, X, y, **kwargs):
self.l = kwargs['l']
X = np.matrix(X)
y = np.matrix(y)
W = (X.transpose() * X).getI() * X.transpose() * y
self.weights = [w[0] for w in W.tolist()]
def get_params(self, deep = False):
return {'l':self.l}
X = np.matrix([[0, 0], [1, 0], [0, 1], [1, 1]])
y = np.matrix([0, 1, 1, 0]).transpose()
print cross_val_score(RegularizedRegressor(),
X,
y,
fit_params={'l':0.1},
scoring = 'mean_squared_error')
这篇关于如何在 sklearn 中编写自定义估算器并对其使用交叉验证?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!