PyTorch中的向后功能 | PyTorch中的向后功能

本文介绍了PyTorch中的向后功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对pytorch的向后功能有些疑问，我认为我没有得到正确的输出

i have some question about pytorch's backward function i don't think i'm getting the right output

import numpy as np
import torch
from torch.autograd import Variable
a = Variable(torch.FloatTensor([[1,2,3],[4,5,6]]), requires_grad=True)
out = a * a
out.backward(a)
print(a.grad)

输出为

tensor([[ 2.,  8., 18.],
        [32., 50., 72.]])

也许是2*a*a

但是我认为输出应该是

tensor([[ 2.,  4., 6.],
        [8., 10., 12.]])

2*a.原因d(x^2)/dx=2x

推荐答案

请仔细阅读 backward() 可以更好地理解它.

Please read carefully the documentation on backward() to better understand it.

默认情况下，pytorch期望网络的 last 输出调用backward()-损失函数.损失函数始终会输出标量，因此标量损失与所有其他变量/参数的梯度都得到了很好的定义(使用链式规则).
因此，默认情况下，backwards()在标量张量上调用，并且不包含任何参数.
例如:

By default, pytorch expects backward() to be called for the last output of the network - the loss function. The loss function always outputs a scalar and therefore, the gradients of the scalar loss w.r.t all other variables/parameters is well defined (using the chain rule).
Thus, by default, backwards() is called on a scalar tensor and expects no arguments.
For example:

a = torch.tensor([[1,2,3],[4,5,6]], dtype=torch.float, requires_grad=True)
for i in range(2):
  for j in range(3):
    out = a[i,j] * a[i,j]
    out.backward()
print(a.grad)

收益

tensor([[ 2.,  4.,  6.],
        [ 8., 10., 12.]])

如预期的那样:d(a^2)/da = 2a.

但是，当您在2 x 3 out张量(不再是标量函数)上调用backwards时，您期望a.grad是什么?实际上，您实际上需要一个2×3×2×3的输出:d out[i,j] / d a[k,l](！)
Pytorch不支持此非标量函数导数.
取而代之的是，pytorch假定out只是一个中间张量，并且在上游"某处有一个标量损失函数，该函数通过链式规则提供了d loss/ d out[i,j].此上游"渐变的大小为2 x 3，在这种情况下，实际上是您提供的backward自变量:out.backward(g)其中g_ij = d loss/ d out_ij.
然后通过链法则d loss / d a[i,j] = (d loss/d out[i,j]) * (d out[i,j] / d a[i,j])
计算梯度由于您提供了a作为上游"渐变，因此获得

However, when you call backwards on the 2-by-3 out tensor (no longer a scalar function) - what do you expects a.grad to be? You'll actually need a 2-by-3-by-2-by-3 output: d out[i,j] / d a[k,l](!)
Pytorch does not support this non-scalar function derivatives.
Instead, pytorch assumes out is only an intermediate tensor and somewhere "upstream" there is a scalar loss function, that through chain rule provides d loss/ d out[i,j]. This "upstream" gradient is of size 2-by-3 and this is actually the argument you provide backward in this case: out.backward(g) where g_ij = d loss/ d out_ij.
The gradients are then calculated by chain rule d loss / d a[i,j] = (d loss/d out[i,j]) * (d out[i,j] / d a[i,j])
Since you provided a as the "upstream" gradients you got

a.grad[i,j] = 2 * a[i,j] * a[i,j]

如果要提供上游"梯度，则全部使用

If you were to provide the "upstream" gradients to be all ones

out.backward(torch.ones(2,3))
print(a.grad)

收益

tensor([[ 2.,  4.,  6.],
        [ 8., 10., 12.]])

符合预期.

这都是连锁法则.

这篇关于PyTorch中的向后功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！