问题描述
为什么我们需要在PyTorch中将梯度显式归零?为什么在调用loss.backward()
时梯度不能归零?通过在图表上保留渐变并要求用户将渐变显式清零,可以提供什么服务?
Why do we need to explicitly zero the gradients in PyTorch? Why can't gradients be zeroed when loss.backward()
is called? What scenario is served by keeping the gradients on the graph and asking the user to explicitly zero the gradients?
推荐答案
我们明确需要调用zero_grad()
,因为在loss.backward()
之后(计算梯度时),我们需要使用optimizer.step()
进行梯度下降.更具体地说,梯度不会自动归零,因为loss.backward()
和optimizer.step()
这两个操作是分开的,并且optimizer.step()
需要刚刚计算出的梯度.
We explicitly need to call zero_grad()
because, after loss.backward()
(when gradients are computed), we need to use optimizer.step()
to proceed gradient descent. More specifically, the gradients are not automatically zeroed because these two operations, loss.backward()
and optimizer.step()
, are separated, and optimizer.step()
requires the just computed gradients.
此外,有时候,我们需要在一些批次之间积累梯度;为此,我们只需多次调用backward
并进行一次优化即可.
In addition, sometimes, we need to accumulate gradient among some batches; to do that, we can simply call backward
multiple times and optimize once.
这篇关于为什么我们需要显式调用zero_grad()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!