本文介绍了如何从sklearn GridSearchCV同时获得MSE和R2?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以在管道上使用GridSearchCV并将得分指定为'MSE''R2'.然后,我可以访问gridsearchcv._best_score来恢复我指定的那个.我还如何获得GridSearchCV找到的解决方案的其他分数?

I can use a GridSearchCV on a pipeline and specify scoring to either be 'MSE' or 'R2'. I can then access gridsearchcv._best_score to recover the one I specified. How do I also get the other score for the solution found by GridSearchCV?

如果我再次使用另一个得分参数运行GridSearchCV,它可能找不到相同的解决方案,因此它报告的得分可能与我们拥有第一个值的模型所对应的模型不对应.

If I run GridSearchCV again with the other scoring parameter, it might not find the same solution, and so the score it reports might not correspond to the same model as the one for which we have the first value.

也许我可以提取参数并将其提供给新管道,然后使用新管道运行cross_val_score?有没有更好的办法?谢谢.

Maybe I can extract the parameters and supply them to a new pipeline, and then run cross_val_score with the new pipeline? Is there a better way? Thanks.

推荐答案

不幸的是,对于GridSearchCV或任何内置的sklearn方法/对象,这现在并不简单.

This is unfortunately not straightforward right now with GridSearchCV, or any built in sklearn method/object.

尽管有传言称有多个计分器输出,但是此功能可能不会很快出现.

Although there is talk of having multiple scorer outputs, this feature will probably not come soon.

所以您必须自己动手,有几种方法:

So you will have to do it yourself, there are several ways:

1)您可以看一下cross_val_score的代码并自己执行交叉验证循环,每完成一遍就调用感兴趣的得分手.

1) You can take a look at the code of cross_val_score and perform the cross validation loop yourself, calling the scorers of interest once each fold is done.

2)[不推荐]您还可以根据自己感兴趣的得分手构建自己的得分手,并让他们将得分输出为数组.然后,您会发现自己遇到了以下问题: sklearn-具有多个分数的交叉验证

2) [not recommended] You can also build your own scorer out of the scorers you are interested in and have them output the scores as an array. You will then find yourself with the problem explained here:sklearn - Cross validation with multiple scores

3)由于您可以对您的代码进行编码自己的得分手,您可以使一个得分手输出您的一个得分(您希望通过GridSearchCV做出决定的得分),并将您感兴趣的所有其他得分存储在单独的位置,可能是静态/全局变量,甚至是文件.

3) Since you can code your own scorers, you could make a scorer that outputs one of your scores (the one by which you want GridSearchCV to make decisions), and which stores all the other scores you are interested in in a separate place, which may be a static/global variable, or even a file.

数字3似乎是最乏味和最有前途的:

Number 3 seems the least tedious and most promising:

import numpy as np
from sklearn.metrics import r2_score, mean_squared_error
secret_mses = []

def r2_secret_mse(estimator, X_test, y_test):
    predictions = estimator.predict(X_test)
    secret_mses.append(mean_squared_error(y_test, predictions))
    return r2_score(y_test, predictions)

X = np.random.randn(20, 10)
y = np.random.randn(20)

from sklearn.cross_validation import cross_val_score
from sklearn.linear_model import Ridge

r2_scores = cross_val_score(Ridge(), X, y, scoring=r2_secret_mse, cv=5)

您将在r2_scores中找到R2分数,并在secret_mses中找到相应的MSE.

You will find the R2 scores in r2_scores and the corresponding MSEs in secret_mses.

请注意,如果并行执行,这可能会变得混乱.在这种情况下,您需要将分数写入例如内存映射中的特定位置.

Note that this can become messy if you go parallel. In that case you would need to write the scores to a specific place in a memmap for example.

这篇关于如何从sklearn GridSearchCV同时获得MSE和R2?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 21:01