我有下面的神经网络,用keras编写,使用tensorflow作为后端,在windows 10上的python 3.5(anaconda)上运行:

    model = Sequential()
    model.add(Dense(100, input_dim=283, init='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(150, init='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(200, init='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(200, init='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(200, init='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(4, init='normal', activation='sigmoid'))
    sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

我在训练我的GPU。在训练期间(10000个阶段),朴素网络的精度从0.25稳步增加到0.7到0.9之间,然后突然下降并保持在0.25:
    Epoch 1/10000
    6120/6120 [==============================] - 1s - loss: 1.5329 - acc: 0.2665
    Epoch 2/10000
    6120/6120 [==============================] - 1s - loss: 1.2985 - acc: 0.3784
    Epoch 3/10000
    6120/6120 [==============================] - 1s - loss: 1.2259 - acc: 0.4891
    Epoch 4/10000
    6120/6120 [==============================] - 1s - loss: 1.1867 - acc: 0.5208
    Epoch 5/10000
    6120/6120 [==============================] - 1s - loss: 1.1494 - acc: 0.5199
    Epoch 6/10000
    6120/6120 [==============================] - 1s - loss: 1.1042 - acc: 0.4953
    Epoch 7/10000
    6120/6120 [==============================] - 1s - loss: 1.0491 - acc: 0.4982
    Epoch 8/10000
    6120/6120 [==============================] - 1s - loss: 1.0066 - acc: 0.5065
    Epoch 9/10000
    6120/6120 [==============================] - 1s - loss: 0.9749 - acc: 0.5338
    Epoch 10/10000
    6120/6120 [==============================] - 1s - loss: 0.9456 - acc: 0.5696
    Epoch 11/10000
    6120/6120 [==============================] - 1s - loss: 0.9252 - acc: 0.5995
    Epoch 12/10000
    6120/6120 [==============================] - 1s - loss: 0.9111 - acc: 0.6106
    Epoch 13/10000
    6120/6120 [==============================] - 1s - loss: 0.8772 - acc: 0.6160
    Epoch 14/10000
    6120/6120 [==============================] - 1s - loss: 0.8517 - acc: 0.6245
    Epoch 15/10000
    6120/6120 [==============================] - 1s - loss: 0.8170 - acc: 0.6345
    Epoch 16/10000
    6120/6120 [==============================] - 1s - loss: 0.7850 - acc: 0.6428
    Epoch 17/10000
    6120/6120 [==============================] - 1s - loss: 0.7633 - acc: 0.6580
    Epoch 18/10000
    6120/6120 [==============================] - 4s - loss: 0.7375 - acc: 0.6717
    Epoch 19/10000
    6120/6120 [==============================] - 1s - loss: 0.7058 - acc: 0.6850
    Epoch 20/10000
    6120/6120 [==============================] - 1s - loss: 0.6787 - acc: 0.7018
    Epoch 21/10000
    6120/6120 [==============================] - 1s - loss: 0.6557 - acc: 0.7093
    Epoch 22/10000
    6120/6120 [==============================] - 1s - loss: 0.6304 - acc: 0.7208
    Epoch 23/10000
    6120/6120 [==============================] - 1s - loss: 0.6052 - acc: 0.7270
    Epoch 24/10000
    6120/6120 [==============================] - 1s - loss: 0.5848 - acc: 0.7371
    Epoch 25/10000
    6120/6120 [==============================] - 1s - loss: 0.5564 - acc: 0.7536
    Epoch 26/10000
    6120/6120 [==============================] - 1s - loss: 0.1787 - acc: 0.4163
    Epoch 27/10000
    6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
    Epoch 28/10000
    6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
    Epoch 29/10000
    6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
    Epoch 30/10000
    6120/6120 [==============================] - 2s - loss: 1.1921e-07 - acc: 0.2500
    Epoch 31/10000
    6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
    Epoch 32/10000
    6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500 ...

我猜这是因为乐观者陷入了一个局部最小值,它将所有数据分配给一个类别我怎样才能阻止它这样做呢?
我尝试过的事情(但似乎无法阻止这种情况的发生):
使用不同的乐观主义者(亚当)
确保培训数据包括每个类别的相同数量的示例
增加培训数据量(目前为6000)
在2到5之间改变类别的数目
将网络中隐藏层的数量从1增加到5
更改层的宽度(从50到500)
这些都没有帮助。有没有其他的想法,为什么会发生这种情况和/或如何抑制它?可能是喀拉斯的虫子吗?非常感谢您的建议。
编辑:
通过将最终激活更改为softmax(来自sigmoid)并将maxnorm(3)正则化添加到最后两个隐藏层,该问题似乎已得到解决:
model = Sequential()
model.add(Dense(100, input_dim=npoints, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(150, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu', W_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu', W_constraint=maxnorm(3)))
model.add(Dropout(0.2))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.add(Dense(ncat, init='normal', activation='softmax'))
model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy'])

非常感谢你的建议。

最佳答案

问题在于sigmoid作为最后一层的激活。在这种情况下,最后一层的输出不能解释为属于单个类的给定示例的概率分布这个层的输出通常不等于1在这种情况下,优化可能会导致意外行为。我认为添加一个maxnorm约束是不必要的,但是我强烈建议您使用categorical_crossentropy而不是mse损失,因为它证明了这个函数对于这个优化情况更有效。

08-25 00:28