我有下面的神经网络,用keras编写,使用tensorflow作为后端,在windows 10上的python 3.5(anaconda)上运行:
model = Sequential()
model.add(Dense(100, input_dim=283, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(150, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(4, init='normal', activation='sigmoid'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
我在训练我的GPU。在训练期间(10000个阶段),朴素网络的精度从0.25稳步增加到0.7到0.9之间,然后突然下降并保持在0.25:
Epoch 1/10000
6120/6120 [==============================] - 1s - loss: 1.5329 - acc: 0.2665
Epoch 2/10000
6120/6120 [==============================] - 1s - loss: 1.2985 - acc: 0.3784
Epoch 3/10000
6120/6120 [==============================] - 1s - loss: 1.2259 - acc: 0.4891
Epoch 4/10000
6120/6120 [==============================] - 1s - loss: 1.1867 - acc: 0.5208
Epoch 5/10000
6120/6120 [==============================] - 1s - loss: 1.1494 - acc: 0.5199
Epoch 6/10000
6120/6120 [==============================] - 1s - loss: 1.1042 - acc: 0.4953
Epoch 7/10000
6120/6120 [==============================] - 1s - loss: 1.0491 - acc: 0.4982
Epoch 8/10000
6120/6120 [==============================] - 1s - loss: 1.0066 - acc: 0.5065
Epoch 9/10000
6120/6120 [==============================] - 1s - loss: 0.9749 - acc: 0.5338
Epoch 10/10000
6120/6120 [==============================] - 1s - loss: 0.9456 - acc: 0.5696
Epoch 11/10000
6120/6120 [==============================] - 1s - loss: 0.9252 - acc: 0.5995
Epoch 12/10000
6120/6120 [==============================] - 1s - loss: 0.9111 - acc: 0.6106
Epoch 13/10000
6120/6120 [==============================] - 1s - loss: 0.8772 - acc: 0.6160
Epoch 14/10000
6120/6120 [==============================] - 1s - loss: 0.8517 - acc: 0.6245
Epoch 15/10000
6120/6120 [==============================] - 1s - loss: 0.8170 - acc: 0.6345
Epoch 16/10000
6120/6120 [==============================] - 1s - loss: 0.7850 - acc: 0.6428
Epoch 17/10000
6120/6120 [==============================] - 1s - loss: 0.7633 - acc: 0.6580
Epoch 18/10000
6120/6120 [==============================] - 4s - loss: 0.7375 - acc: 0.6717
Epoch 19/10000
6120/6120 [==============================] - 1s - loss: 0.7058 - acc: 0.6850
Epoch 20/10000
6120/6120 [==============================] - 1s - loss: 0.6787 - acc: 0.7018
Epoch 21/10000
6120/6120 [==============================] - 1s - loss: 0.6557 - acc: 0.7093
Epoch 22/10000
6120/6120 [==============================] - 1s - loss: 0.6304 - acc: 0.7208
Epoch 23/10000
6120/6120 [==============================] - 1s - loss: 0.6052 - acc: 0.7270
Epoch 24/10000
6120/6120 [==============================] - 1s - loss: 0.5848 - acc: 0.7371
Epoch 25/10000
6120/6120 [==============================] - 1s - loss: 0.5564 - acc: 0.7536
Epoch 26/10000
6120/6120 [==============================] - 1s - loss: 0.1787 - acc: 0.4163
Epoch 27/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
Epoch 28/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
Epoch 29/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
Epoch 30/10000
6120/6120 [==============================] - 2s - loss: 1.1921e-07 - acc: 0.2500
Epoch 31/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
Epoch 32/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500 ...
我猜这是因为乐观者陷入了一个局部最小值,它将所有数据分配给一个类别我怎样才能阻止它这样做呢?
我尝试过的事情(但似乎无法阻止这种情况的发生):
使用不同的乐观主义者(亚当)
确保培训数据包括每个类别的相同数量的示例
增加培训数据量(目前为6000)
在2到5之间改变类别的数目
将网络中隐藏层的数量从1增加到5
更改层的宽度(从50到500)
这些都没有帮助。有没有其他的想法,为什么会发生这种情况和/或如何抑制它?可能是喀拉斯的虫子吗?非常感谢您的建议。
编辑:
通过将最终激活更改为softmax(来自sigmoid)并将maxnorm(3)正则化添加到最后两个隐藏层,该问题似乎已得到解决:
model = Sequential()
model.add(Dense(100, input_dim=npoints, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(150, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu', W_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu', W_constraint=maxnorm(3)))
model.add(Dropout(0.2))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.add(Dense(ncat, init='normal', activation='softmax'))
model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy'])
非常感谢你的建议。
最佳答案
问题在于sigmoid
作为最后一层的激活。在这种情况下,最后一层的输出不能解释为属于单个类的给定示例的概率分布这个层的输出通常不等于1在这种情况下,优化可能会导致意外行为。我认为添加一个maxnorm
约束是不必要的,但是我强烈建议您使用categorical_crossentropy
而不是mse
损失,因为它证明了这个函数对于这个优化情况更有效。