梯度下降与随机梯度下降算法

本文介绍了梯度下降与随机梯度下降算法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在MNIST手写数字数据集(包括60K训练样本)上训练前馈神经网络.

I tried to train a FeedForward Neural Network on the MNIST Handwritten Digits dataset (includes 60K training samples).

我每次迭代所有训练样本，并在每个时期对每个这样的样本执行反向传播.运行时间当然太长了.

I each time iterated over all the training samples, performing Backpropagation for each such sample on every epoch. The runtime is of course too long.

我运行的算法是否命名为 Gradient Descent ?

我了解到，对于大型数据集，使用随机梯度下降可以显着改善运行时间.

I read that for large datasets, using Stochastic Gradient Descent can improve the runtime dramatically.

我该怎么做才能使用随机梯度下降?我是否应该随机选择训练样本，对每个随机选择的样本执行反向传播，而不是我当前使用的时期?

What should I do in order to use Stochastic Gradient Descent? Should I just pick the training samples randomly, performing Backpropagation on each randomly picked sample, instead of the epochs I currently use?

推荐答案

您描述的新场景(对每个随机选取的样本执行反向传播)，是随机梯度下降的一种常见味道"，如此处所述: https://www .quora.com/梯度下降与随机梯度下降之间的差异

The new scenario you describe (performing Backpropagation on each randomly picked sample), is one common "flavor" of Stochastic Gradient Descent, as described here: https://www.quora.com/Whats-the-difference-between-gradient-descent-and-stochastic-gradient-descent

根据此文档，最常见的3种风味是(您的风味是C):

The 3 most common flavors according to this document are (Your flavor is C):

randomly shuffle samples in the training set
for one or more epochs, or until approx. cost minimum is reached:
    for training sample i:
        compute gradients and perform weight updates

for one or more epochs, or until approx. cost minimum is reached:
    randomly shuffle samples in the training set
    for training sample i:
        compute gradients and perform weight updates

for iterations t, or until approx. cost minimum is reached:
    draw random sample from the training set
    compute gradients and perform weight updates

这篇关于梯度下降与随机梯度下降算法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！