I tried to train a FeedForward Neural Network on the MNIST Handwritten Digits dataset (includes 60K training samples).
I each time iterated over all the training samples, performing Backpropagation for each such sample on every epoch. The runtime is of course too long.
- 我运行的算法是否命名为 Gradient Descent ?
I read that for large datasets, using Stochastic Gradient Descent can improve the runtime dramatically.
- 我该怎么做才能使用随机梯度下降?我是否应该随机选择训练样本,对每个随机选择的样本执行反向传播,而不是我当前使用的时期?
- What should I do in order to use Stochastic Gradient Descent? Should I just pick the training samples randomly, performing Backpropagation on each randomly picked sample, instead of the epochs I currently use?
您描述的新场景(对每个随机选取的样本执行反向传播),是随机梯度下降的一种常见味道",如此处所述: https://www .quora.com/梯度下降与随机梯度下降之间的差异
The new scenario you describe (performing Backpropagation on each randomly picked sample), is one common "flavor" of Stochastic Gradient Descent, as described here: https://www.quora.com/Whats-the-difference-between-gradient-descent-and-stochastic-gradient-descent
The 3 most common flavors according to this document are (Your flavor is C):
randomly shuffle samples in the training set
for one or more epochs, or until approx. cost minimum is reached:
for training sample i:
compute gradients and perform weight updates
for one or more epochs, or until approx. cost minimum is reached:
randomly shuffle samples in the training set
for training sample i:
compute gradients and perform weight updates
for iterations t, or until approx. cost minimum is reached:
draw random sample from the training set
compute gradients and perform weight updates