TensorFlow 或 Theano:他们如何知道基于神经网络图的损失函数导数?

本文介绍了TensorFlow 或 Theano:他们如何知道基于神经网络图的损失函数导数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 TensorFlow 或 Theano 中，您只需告诉库您的神经网络如何，以及前馈应该如何运作.

In TensorFlow or Theano, you only tell the library how your neural network is, and how feed-forward should operate.

例如，在 TensorFlow 中，您可以这样写:

For instance, in TensorFlow, you would write:

with graph.as_default():
    _X = tf.constant(X)
    _y = tf.constant(y)

    hidden = 20
    w0 = tf.Variable(tf.truncated_normal([X.shape[1], hidden]))
    b0 = tf.Variable(tf.truncated_normal([hidden]))

    h = tf.nn.softmax(tf.matmul(_X, w0) + b0)

    w1 = tf.Variable(tf.truncated_normal([hidden, 1]))
    b1 = tf.Variable(tf.truncated_normal([1]))

    yp = tf.nn.softmax(tf.matmul(h, w1) + b1)

    loss = tf.reduce_mean(0.5*tf.square(yp - _y))
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

我正在使用 L2 范数损失函数，C=0.5*sum((y-yp)^2)，并且在反向传播步骤中，大概必须计算导数，dC=sum(y-yp).请参阅本书中的(30).

I am using L2-norm loss function, C=0.5*sum((y-yp)^2), and in the backpropagation step presumably the derivative will have to be computed, dC=sum(y-yp). See (30) in this book.

我的问题是:TensorFlow(或 Theano)如何知道反向传播的解析导数?或者他们做一个近似?或者以某种方式不使用衍生物?

My question is: how can TensorFlow (or Theano) know the analytical derivative for backpropagation? Or do they do an approximation? Or somehow do not use the derivative?

我已经在 TensorFlow 上完成了深度学习 udacity 课程，但我我仍然对如何理解这些库的工作方式存在分歧.

I have done the deep learning udacity course on TensorFlow, but I am still at odds at how to make sense on how these libraries work.

推荐答案

差异发生在最后一行:

    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

当您执行 minimize() 方法时，TensorFlow 会识别 loss 所依赖的变量集，并计算每个变量的梯度.区别在ops/gradients.py，它使用反向积累".本质上，它从 loss 张量向后搜索变量，在数据流图中的每个运算符上应用链式法则.TensorFlow 包含大多数(可微分)运算符的梯度函数"，您可以在 ops/math_grad.py.梯度函数可以使用原始操作(包括其输入、输出和属性)和为其每个输出计算的梯度来为其每个输入生成梯度.

When you execute the minimize() method, TensorFlow identifies the set of variables on which loss depends, and computes gradients for each of these. The differentiation is implemented in ops/gradients.py, and it uses "reverse accumulation". Essentially it searches backwards from the loss tensor to the variables, applying the chain rule at each operator in the dataflow graph. TensorFlow includes "gradient functions" for most (differentiable) operators, and you can see an example of how these are implemented in ops/math_grad.py. A gradient function can use the original op (including its inputs, outputs, and attributes) and the gradients computed for each of its outputs to produce gradients for each of its inputs.

Ilya Sutskever 的博士论文的第 7 页很好地解释了此过程的工作原理总的来说.

Page 7 of Ilya Sutskever's PhD thesis has a nice explanation of how this process works in general.

这篇关于TensorFlow 或 Theano:他们如何知道基于神经网络图的损失函数导数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

tensorflow