问题描述
我正在研究反向传播算法的实现.到目前为止,我已经实现的功能似乎可以正常工作,但是我不能确定算法是否实现良好,这是我在网络训练测试期间注意到的内容:
I am working on an implementation of the back propagation algorithm. What I have implemented so far seems working but I can't be sure that the algorithm is well implemented, here is what I have noticed during training test of my network :
实施说明:
- 包含几乎100000个原始数据的数据集(作为输入的3个变量,这三个变量的和的正弦值作为预期的输出).
- 网络确实有7层,所有层均使用Sigmoid激活功能
当我进行反向传播训练过程时:
When I run the back propagation training process:
- 在第四次迭代中发现错误的最小成本(错误的最小成本是140,是正常现象吗?我期望的误差要小得多)
- 在第四次迭代之后,错误的代价开始增加(我不知道这是否正常吗?)
- The minimum of costs of the error is found at the fourth iteration (The minimum cost of error is 140, is it normal? I was expecting much less than that)
- After the fourth Iteration the costs of the error start increasing (I don't know if it is normal or not?)
推荐答案
简短的回答是不,很可能您的实现不正确".您的网络没有进行培训,这可以从很高的错误成本中看出.如评论所述,您的网络遭受消失梯度问题的沉重打击,这在深层次上是不可避免的网络.从本质上讲,网络的第一层比第二层学习得慢得多.所有神经元在开始时都会获得一些随机权数,对吗?由于第一层几乎什么都不学,因此较大的初始错误会在整个网络中传播!
The short answer would be "no, very likely your implementation is incorrect". Your network is not training as can be observed by the very high cost of error. As discussed in comments, your network suffers very heavily from vanishing gradient problem, which is inevitable in deep networks. In essence, the first layers of you network learn much slower than the later. All neurons get some random weights at the beginning, right? Since the first layer almost doesn't learn anything, the large initial error propagates through the whole network!
如何解决?从对问题的描述中,似乎只有一个隐藏层的前馈网络应该可以解决问题(如通用逼近定理).
How to fix it? From the description of your problem it seems that a feedforward network with just a single hidden layer in should be able to do the trick (as proven in universal approximation theorem).
检查例如如果您想了解更多信息,请免费的迈克尔·尼尔森在线图书.
Check e.g. free online book by Michael Nielsen if you'd like to learn more.
可以,但绝不是一个小挑战.自60年代以来就开始使用深度神经网络,但只有90年代的研究人员提出了有效处理它们的方法.我建议阅读神经网络:交易的技巧"一章(由Y.A. LeCun等人撰写)的有效的反向传播"一章.
It can, but it's by no mean a trivial challenge. Deep neural networks have been used since 60', but only in 90' researchers came up with methods how to deal with them efficiently. I recommend reading "Efficient BackProp" chapter (by Y.A. LeCun et al.) of "Neural Networks: Tricks of the Trade".
以下是摘要:
- 随机播放示例
- 通过减去平均值居中输入变量
- 将输入变量归一化为标准偏差1
- 如果可能,将输入变量解相关.
- 选择具有S型函数
f(x)=1.7159*(tanh(2/3x)
的网络:它不会以+1/-1饱和,而是在这些点上具有最高的增益(二阶导数最大). - 在S形范围内设置目标值,通常为+1和-1.
- 权重应从均值为零且标准差由
m^(-1/2)
给出的分布中随机抽取,其中m
是单位输入的数量
- Shuffle the examples
- Center the input variables by subtracting the mean
- Normalize the input variable to a standard deviation of 1
- If possible, decorrelate the input variables.
- Pick a network with the sigmoid function
f(x)=1.7159*(tanh(2/3x)
: it won't saturate at +1 / -1, but instead will have highest gain at these points (second derivative is at max.) - Set the target values within the range of the sigmoid, typically +1 and -1.
- The weights should be randomly drawn from a distribution with mean zero and a standard deviation given by
m^(-1/2)
, wherem
is the number of inputs to the unit
用于训练网络的首选方法应选择如下:
The preferred method for training the network should be picked as follows:
- 如果训练集很大(超过几百个样本)并且是多余的,并且任务是分类的,则使用经过仔细调整的随机梯度,或使用随机对角线Levenberg Marquardt方法.
- 如果训练集不是太大,或者任务是回归的,则使用共轭梯度.
另外,我的一些一般性评论:
Also, some my general remarks:
- 如果自己实现,请注意数值稳定性.很容易遇到麻烦.
- 思考架构.完全连接的多层网络很少是一个聪明的主意.不幸的是,从理论的角度对ANN知之甚少,您可以做的最好的事情之一就是检查对其他人有用的方法,并学习有用的模式(使用正则化,池化和辍学层等).
- Watch for numerical stability if you implement it yourself. It's easy to get into troubles.
- Think of the architecture. Fully-connected multi-layer networks are rarely a smart idea. Unfortunately ANN are poorly understood from theoretical point of view and one of the best things you can do is just check what worked for others and learn useful patterns (with regularization, pooling and dropout layers and such).
这篇关于如何测试反向传播神经网络的实现是否正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!