问题描述
我希望使用Keras和sklean的GridSearchCV
尽早停止.
下面的工作代码示例是从如何进行网格修改而来的使用Keras在Python中搜索超参数以进行深度学习模型.数据集可以是从这里下载.
此修改添加了Keras EarlyStopping
回调类,以防止过度拟合.为了使此方法有效,需要使用monitor='val_acc'
参数来监视验证准确性.为了使val_acc
可用,KerasClassifier
要求validation_split=0.1
生成验证精度,否则EarlyStopping
会提高RuntimeWarning: Early stopping requires val_acc available!
.注意FIXME:
代码注释!
请注意,我们可以将val_acc
替换为val_loss
!
问题:如何使用GridSearchCV
k倍算法生成的交叉验证数据集,而不是将10%的训练数据浪费在提早停止的验证集上?
# Use scikit-learn to grid search the learning rate and momentum
import numpy
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import SGD
# Function to create model, required for KerasClassifier
def create_model(learn_rate=0.01, momentum=0):
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
optimizer = SGD(lr=learn_rate, momentum=momentum)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
# Early stopping
from keras.callbacks import EarlyStopping
stopper = EarlyStopping(monitor='val_acc', patience=3, verbose=1)
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(
build_fn=create_model,
epochs=100, batch_size=10,
validation_split=0.1, # FIXME: Instead use GridSearchCV k-fold validation data.
verbose=2)
# define the grid search parameters
learn_rate = [0.01, 0.1]
momentum = [0.2, 0.4]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)
grid = GridSearchCV(estimator=model, param_grid=param_grid, verbose=2, n_jobs=1)
# Fitting parameters
fit_params = dict(callbacks=[stopper])
# Grid search.
grid_result = grid.fit(X, Y, **fit_params)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
[问题编辑后的答案&澄清:]
在急于解决实施问题之前,花一些时间思考方法论和任务本身始终是一个好习惯;可以说,将早期停止与交叉验证过程混合在一起不是一个好主意.
让我们组成一个示例以突出显示该论点.
假设您确实使用了100个纪元的提前停止,并使用5倍交叉验证(CV)进行超参数选择.还假设您最终得到的超参数集X具有最佳性能,例如二进制分类精度为89.3%.
现在假设您第二好的超参数集Y的准确度为89.2%.仔细检查各个CV折叠,您会发现,对于最佳情况X,5个CV折叠中有3个耗尽了最多100个历元,而另外2个早期停止播放了,例如分别达到了89个和93个纪元. >
现在想像一下,检查第二好的Y集,您会发现5个CV折叠中有4个耗尽了100个时期,而第5个CV折叠则在约80个时期就停了下来.
从这样的实验中您会得出什么结论?
可以说,您会发现自己处于不确定的情况;进一步的实验可能会揭示实际上最好的超参数集,当然,前提是您首先想到的是这些结果的详细信息.不用说,如果所有这些都是通过回调自动完成的,那么即使您实际上已经尝试过
,您仍可能错过了最佳模型.整个CV想法隐式地基于其他所有条件都相同"的论点(当然,这在实践中永远是不正确的,只能以最佳方式近似).如果您认为时期数应该是一个超参数,只需将其明确地包含在您的简历中,而不是将其插入早期停止的后门,从而可能损害整个过程(更不用说早期停止本身具有超参数,patience
).
不将这两种技术混合在一起并不意味着您不能依次使用:一旦通过CV获得了最佳的超参数,在将模型拟合到您的模型中时,您总是可以尽早停止使用整个训练集(当然,前提是您确实有单独的验证集).
深度神经网络领域仍然(非常)年轻,并且确实还没有建立其最佳实践"准则.加上一个令人惊奇的社区,开放源代码实现中提供了各种工具,您可以轻松地将自己混合在一起(确实很诱人),因为它们碰巧可用.我并不一定要说这就是您在这里要尝试做的-我只是在敦促将可能并非旨在一起使用的想法结合起来时要格外小心...
I wish to implement early stopping with Keras and sklean's GridSearchCV
.
The working code example below is modified from How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras. The data set may be downloaded from here.
The modification adds the Keras EarlyStopping
callback class to prevent over-fitting. For this to be effective it requires the monitor='val_acc'
argument for monitoring validation accuracy. For val_acc
to be available KerasClassifier
requires the validation_split=0.1
to generate validation accuracy, else EarlyStopping
raises RuntimeWarning: Early stopping requires val_acc available!
. Note the FIXME:
code comment!
Note we could replace val_acc
by val_loss
!
Question: How can I use the cross-validation data set generated by the GridSearchCV
k-fold algorithm instead of wasting 10% of the training data for an early stopping validation set?
# Use scikit-learn to grid search the learning rate and momentum
import numpy
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import SGD
# Function to create model, required for KerasClassifier
def create_model(learn_rate=0.01, momentum=0):
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
optimizer = SGD(lr=learn_rate, momentum=momentum)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
# Early stopping
from keras.callbacks import EarlyStopping
stopper = EarlyStopping(monitor='val_acc', patience=3, verbose=1)
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(
build_fn=create_model,
epochs=100, batch_size=10,
validation_split=0.1, # FIXME: Instead use GridSearchCV k-fold validation data.
verbose=2)
# define the grid search parameters
learn_rate = [0.01, 0.1]
momentum = [0.2, 0.4]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)
grid = GridSearchCV(estimator=model, param_grid=param_grid, verbose=2, n_jobs=1)
# Fitting parameters
fit_params = dict(callbacks=[stopper])
# Grid search.
grid_result = grid.fit(X, Y, **fit_params)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
[Answer after the question was edited & clarified:]
Before rushing into implementation issues, it is always a good practice to take some time to think about the methodology and the task itself; arguably, intermingling early stopping with the cross validation procedure is not a good idea.
Let's make up an example to highlight the argument.
Suppose that you indeed use early stopping with 100 epochs, and 5-fold cross validation (CV) for hyperparameter selection. Suppose also that you end up with a hyperparameter set X giving best performance, say 89.3% binary classification accuracy.
Now suppose that your second-best hyperparameter set, Y, gives 89.2% accuracy. Examining closely the individual CV folds, you see that, for your best case X, 3 out of the 5 CV folds exhausted the max 100 epochs, while in the other 2 early stopping kicked in, say in 89 and 93 epochs respectively.
Now imagine that, examining your second-best set Y, you see that 4 out of the 5 CV folds exhausted the 100 epochs, while the 5th stopped early at ~ 80 epochs.
What would be your conclusion from such an experiment?
Arguably, you would have found yourself in an inconclusive situation; further experiments might reveal which is actually the best hyperparameter set, provided of course that you would have thought to look into these details of the results in the first place. And needless to say, if all this was automated through a callback, you might have missed your best model despite the fact that you would have actually tried it.
The whole CV idea is implicitly based on the "all other being equal" argument (which of course is never true in practice, only approximated in the best possible way). If you feel that the number of epochs should be a hyperparameter, just include it explicitly in your CV as such, rather than inserting it through the back door of early stopping, thus possibly compromising the whole process (not to mention that early stopping has itself a hyperparameter, patience
).
Not intermingling these two techniques doesn't mean of course that you cannot use them sequentially: once you have obtained your best hyperparameters through CV, you can always employ early stopping when fitting the model in your whole training set (provided of course that you do have a separate validation set).
The field of deep neural nets is still (very) young, and it is true that it has yet to establish its "best practice" guidelines; add the fact that, thanks to an amazing community, there are all sort of tools available in open source implementations, and you can easily find yourself into the (admittedly tempting) position of mixing things up just because they happen to be available. I am not necessarily saying that this is what you are attempting to do here - I am just urging for more caution when combining ideas that may have not been designed to work along together...
这篇关于使用Keras和sklearn GridSearchCV交叉验证提前停止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!