问题描述
似乎 tf.gradients
也允许计算雅可比矩阵,即一个张量 wrt 的每个条目的偏导数.另一个张量的每个条目,而 tf.train.Optimizer.compute_gradient
仅计算实际梯度,例如标量值wrt的偏导数.特定张量或wrt的每个条目.一个特定的标量.如果 tf.gradients
也实现了该功能,为什么还有一个单独的功能?
It seems that tf.gradients
allows to compute also Jacobians, i.e. the partial derivatives of each entry of one tensor wrt. each entry of another tensor, while tf.train.Optimizer.compute_gradient
only computes actual gradients, e.g. the partial derivatives of a scalar value wrt. each entry of a particular tensor or wrt. one particular scalar. Why is there a separate function if tf.gradients
also implements that functionality?
推荐答案
tf.gradients
不允许您计算雅可比行列式,它聚合了每个输出的每个输入的梯度(类似于实际雅可比矩阵的每一列的总和).事实上,在 TensorFlow 中没有计算雅可比行列式的好"方法(基本上你必须为每个输出调用一次 tf.gradients
,见这个问题).
关于tf.train.Optimizer.compute_gradients
,是的,结果基本一样,只是自动处理了一些细节,输出格式稍微方便一些.如果您查看实现,您会看到它的核心是对 tf.gradients
的调用(在这种情况下别名为 gradients.gradients
),但是对于优化器实现来说,具有周围的已经实现的逻辑.此外,将其作为方法允许子类中的可扩展行为,或者实现某种优化策略(实际上不太可能在 compute_gradients
步骤中)或用于辅助目的,如跟踪或调试.
With respect to tf.train.Optimizer.compute_gradients
, yes, its result is basically the same, but taking care of some details automatically and with slightly more convenient output format. If you look at the implementation, you will see that, at its core, is a call to tf.gradients
(in this case aliased to gradients.gradients
), but it is useful for optimizer implementations to have the surrounding logic already implemented. Also, having it as a method allows for extensible behaviour in subclasses, either to implement some kind of optimization strategy (not very likely at the compute_gradients
step, really) or for auxiliary purposes, like tracing or debugging.
这篇关于tf.gradients 和 tf.train.Optimizer.compute_gradient 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!