问题描述
我试图在MNIST手写数字数据集(包括60K训练样本)上训练前馈神经网络.
I tried to train a FeedForward Neural Network on the MNIST Handwritten Digits dataset (includes 60K training samples).
我每次迭代所有训练样本,并在每个时期对每个这样的样本执行反向传播.运行时间当然太长了.
I each time iterated over all the training samples, performing Backpropagation for each such sample on every epoch. The runtime is of course too long.
- 我运行的算法是否命名为 Gradient Descent ?
我了解到,对于大型数据集,使用随机梯度下降可以显着改善运行时间.
I read that for large datasets, using Stochastic Gradient Descent can improve the runtime dramatically.
- 我该怎么做才能使用随机梯度下降?我是否应该随机选择训练样本,对每个随机选择的样本执行反向传播,而不是我当前使用的时期?
- What should I do in order to use Stochastic Gradient Descent? Should I just pick the training samples randomly, performing Backpropagation on each randomly picked sample, instead of the epochs I currently use?
推荐答案
您描述的新场景(对每个随机选取的样本执行反向传播),是随机梯度下降的一种常见味道",如此处所述: https://www .quora.com/梯度下降与随机梯度下降之间的差异
The new scenario you describe (performing Backpropagation on each randomly picked sample), is one common "flavor" of Stochastic Gradient Descent, as described here: https://www.quora.com/Whats-the-difference-between-gradient-descent-and-stochastic-gradient-descent
根据此文档,最常见的3种风味是(您的风味是C):
The 3 most common flavors according to this document are (Your flavor is C):
A)
randomly shuffle samples in the training set
for one or more epochs, or until approx. cost minimum is reached:
for training sample i:
compute gradients and perform weight updates
B)
for one or more epochs, or until approx. cost minimum is reached:
randomly shuffle samples in the training set
for training sample i:
compute gradients and perform weight updates
C)
for iterations t, or until approx. cost minimum is reached:
draw random sample from the training set
compute gradients and perform weight updates
这篇关于梯度下降与随机梯度下降算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!