pytorch - 为什么autograd不会为中间变量产生梯度？

试图绕过梯度的表示方式和autograd的工作方式：

import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y

z.backward()

print(x.grad)
#Variable containing:
#32
#[torch.FloatTensor of size 1]

print(y.grad)
#None

为什么不为y生成渐变？如果是y.grad = dz/dy，那么它至少不应该产生像y.grad = 2*y这样的变量吗？

最佳答案

默认情况下，仅保留叶变量的渐变。非叶变量的梯度不会保留以供以后检查。这是
设计完成，以节省内存。

-中华umi

参见：https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94

选项1：

致电y.retain_grad()

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y

y.retain_grad()

z.backward()

print(y.grad)
#Variable containing:
# 8
#[torch.FloatTensor of size 1]

资料来源：https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/16

选项2：

注册hook，基本上是计算该梯度时调用的函数。然后，您可以保存，分配，打印它，无论如何...

from __future__ import print_function
import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([2]), requires_grad=True)
y = x * x
z = y * y

y.register_hook(print) ## this can be anything you need it to be

z.backward()

输出：

Variable containing:  8 [torch.FloatTensor of size 1

资料来源：https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/2

另请参见：https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94/7

关于pytorch - 为什么autograd不会为中间变量产生梯度？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/45988168/