问题描述
第一季度.
我正在尝试使用 pytorch 制作我的自定义 autograd 函数.
但是我在使用 y = x/sum(x, dim=0) 进行分析反向传播时遇到了问题
其中张量 x 的大小是(高度,宽度)(x 是二维的).
这是我的代码
class MyFunc(torch.autograd.Function):@静态方法定义转发(ctx,输入):ctx.save_for_backward(输入)输入 = 输入/torch.sum(输入,dim=0)返回输入@静态方法def向后(ctx,grad_output):输入 = ctx.saved_tensors[0]H, W = input.size()sum = torch.sum(输入,dim=0)grad_input = grad_output * (1/sum - input*1/sum**2)返回 grad_input
我使用 (torch.autograd import) gradcheck 来比较雅可比矩阵,
from torch.autograd import gradcheckfunc = MyFunc.apply输入 = (torch.randn(3,3,dtype=torch.double,requires_grad=True))测试 = gradcheck(功能,输入)
结果是
请有人帮我得到正确的反向传播结果
谢谢!
第二季度.
感谢您的回答!
由于您的帮助,我可以在 (H,W) 张量的情况下实现反向传播.
然而,当我在 (N,H,W) 张量的情况下实现反向传播时,我遇到了问题.我认为问题在于初始化新张量.
这是我的新代码
导入火炬将 torch.nn 导入为 nn导入 torch.nn.functional 作为 F类 MyFunc(torch.autograd.Function):@静态方法定义转发(ctx,输入):ctx.save_for_backward(输入)N = input.size(0)对于范围内的 n(N):输入[n]/= torch.sum(输入[n],dim=0)返回输入@静态方法def向后(ctx,grad_output):输入 = ctx.saved_tensors[0]N, H, W = input.size()I = torch.eye(H).unsqueeze(-1)总和 = input.sum(1)grad_input = torch.zeros((N,H,W), dtype = torch.double, requires_grad=True)对于范围内的 n(N):grad_input[n] = ((sum[n] * I - input[n]) * grad_output[n]/sum[n]**2).sum(1)返回 grad_input
Gradcheck 代码是
from torch.autograd import gradcheckfunc = MyFunc.apply输入 = (torch.rand(2,2,2,dtype=torch.double,requires_grad=True))测试 = gradcheck(功能,输入)打印(测试)
结果是在此处输入图片描述
我不知道为什么会出现错误...
您的帮助对我实现自己的卷积网络非常有帮助.
谢谢!祝你有美好的一天.
我们来看一个单列的例子,例如:[[x1], [x2], [x3]]
.
让 sum
为 x1 + x2 + x3
,然后标准化 x
将得到 y = [[y1], [y2], [y3]] = [[x1/sum], [x2/sum], [x3/sum]]
.您正在寻找 dL/dx1
、dL/x2
和 dL/x3
- 我们将它们写成:dx1
、dx2
和 dx3
.所有 dL/dyi
都一样.
所以 dx1
等于 dL/dy1*dy1/dx1 + dL/dy2*dy2/dx1 + dL/dy3*dy3/dx1
.这是因为 x1
对相应列上的所有输出元素都有贡献:y1
、y2
和 y3
.>
我们有:
dy1/dx1 = d(x1/sum)/dx1 = (sum - x1)/sum²
dy2/dx1 = d(x2/sum)/dx1 = -x2/sum²
同理,
dy3/dx1 = d(x3/sum)/dx1 = -x3/sum²
因此dx1 = (sum - x1)/sum²*dy1 - x2/sum²*dy2 - x3/sum²*dy3
.dx2
和 dx3
相同.结果,雅可比行列式是 [dxi]_i = (sum - xi)/sum²
和 [dxi]_j = -xj/sum²
(对于所有 j
不同于 i
).
在您的实现中,您似乎缺少所有非对角线组件.
保持相同的单列示例,使用 x1=2
、x2=3
和 x3=5
:
雅可比行列式将是:
>>>J = (sum*torch.eye(input.size(0)) - input)/sum**2张量([[ 0.0800, -0.0200, -0.0200],[-0.0300, 0.0700, -0.0300],[-0.0500, -0.0500, 0.0500]])对于多列的实现,它有点棘手,更具体地说是对角矩阵的形状.将 column 轴保持在最后更容易,这样我们就不必为广播而烦恼了:
>>>x = torch.tensor([[2., 1], [3., 3], [5., 5]])>>>总和 = x.sum(0)张量([10., 9.])>>>diag = sum*torch.eye(3).unsqueeze(-1).repeat(1, 1, len(sum))张量([[[10., 9.],[0., 0.],[0., 0.]],[[0., 0.],[10., 9.],[0., 0.]],[[0., 0.],[0., 0.],[10., 9.]]])上面的 diag
具有 (3, 3, 2)
的形状,其中两列 位于最后一个轴上.注意我们不需要广播 sum
.
我不会做的是:torch.eye(3).unsqueeze(0).repeat(len(sum), 1, 1)
.由于使用这种形状 - (2, 3, 3)
- 您将不得不使用 sum[:, None, None]
,并且需要进一步向下广播路...
雅可比矩阵很简单:
>>>J = (diag - x)/sum**2张量([[[ 0.0800, 0.0988],[-0.0300, -0.0370],[-0.0500, -0.0617]],[[-0.0200, -0.0123],[0.0700, 0.0741],[-0.0500, -0.0617]],[[-0.0200, -0.0123],[-0.0300, -0.0370],[ 0.0500, 0.0494]]])您可以通过使用任意 dy
向量(但不使用 torch.ones
,您将获得 0
code>s 因为 J
!).反向传播后,x.grad
应该等于 torch.einsum('abc,bc->ac', J, dy)
.
Q1.
I'm trying to make my custom autograd function with pytorch.
But I had a problem with making analytical back propagation with y = x / sum(x, dim=0)
where size of tensor x is (Height, Width) (x is 2-dimensional).
Here's my code
class MyFunc(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
input = input / torch.sum(input, dim=0)
return input
@staticmethod
def backward(ctx, grad_output):
input = ctx.saved_tensors[0]
H, W = input.size()
sum = torch.sum(input, dim=0)
grad_input = grad_output * (1/sum - input*1/sum**2)
return grad_input
I used (torch.autograd import) gradcheck to compare Jacobian matrix,
from torch.autograd import gradcheck
func = MyFunc.apply
input = (torch.randn(3,3,dtype=torch.double,requires_grad=True))
test = gradcheck(func, input)
and the result was
Please someone help me to get correct back propagation result
Thanks!
Q2.
Thanks for answers!
Because of your help, I could implement back propagation in case of (H,W) tensor.
However, while I implemented back propagation in case of (N,H,W) tensor, I got a problem.I think the problem would be initializing new tensor.
Here's my new code
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyFunc(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
N = input.size(0)
for n in range(N):
input[n] /= torch.sum(input[n], dim=0)
return input
@staticmethod
def backward(ctx, grad_output):
input = ctx.saved_tensors[0]
N, H, W = input.size()
I = torch.eye(H).unsqueeze(-1)
sum = input.sum(1)
grad_input = torch.zeros((N,H,W), dtype = torch.double, requires_grad=True)
for n in range(N):
grad_input[n] = ((sum[n] * I - input[n]) * grad_output[n] / sum[n]**2).sum(1)
return grad_input
Gradcheck code is
from torch.autograd import gradcheck
func = MyFunc.apply
input = (torch.rand(2,2,2,dtype=torch.double,requires_grad=True))
test = gradcheck(func, input)
print(test)
and result isenter image description here
I don't know why the error occurs...
Your help will be very helpful for me to implement my own convolutional network.
Thanks! Have a nice day.
Let's look an example with a single column, for instance: [[x1], [x2], [x3]]
.
Let sum
be x1 + x2 + x3
, then normalizing x
will give y = [[y1], [y2], [y3]] = [[x1/sum], [x2/sum], [x3/sum]]
. You're looking for dL/dx1
, dL/x2
, and dL/x3
- we'll just write them as: dx1
, dx2
, and dx3
. Same for all dL/dyi
.
So dx1
is equal to dL/dy1*dy1/dx1 + dL/dy2*dy2/dx1 + dL/dy3*dy3/dx1
. That's because x1
contributes to all ouput element on the corresponding column: y1
, y2
, and y3
.
We have:
dy1/dx1 = d(x1/sum)/dx1 = (sum - x1)/sum²
dy2/dx1 = d(x2/sum)/dx1 = -x2/sum²
similarly,
dy3/dx1 = d(x3/sum)/dx1 = -x3/sum²
Therefore dx1 = (sum - x1)/sum²*dy1 - x2/sum²*dy2 - x3/sum²*dy3
. Same for dx2
and dx3
. As a result, the Jacobian is [dxi]_i = (sum - xi)/sum²
and [dxi]_j = -xj/sum²
(for all j
different to i
).
In your implementation, you seem to be missing all non-diagonal components.
Keeping the same one-column example, with x1=2
, x2=3
, and x3=5
:
>>> x = torch.tensor([[2.], [3.], [5.]])
>>> sum = input.sum(0)
tensor([10])
The Jacobian will be:
>>> J = (sum*torch.eye(input.size(0)) - input)/sum**2
tensor([[ 0.0800, -0.0200, -0.0200],
[-0.0300, 0.0700, -0.0300],
[-0.0500, -0.0500, 0.0500]])
For an implementation with multiple columns, it's a bit trickier, more specifically for the shape of the diagonal matrix. It's easier to keep the column axis last so we don't have to bother with broadcastings:
>>> x = torch.tensor([[2., 1], [3., 3], [5., 5]])
>>> sum = x.sum(0)
tensor([10., 9.])
>>> diag = sum*torch.eye(3).unsqueeze(-1).repeat(1, 1, len(sum))
tensor([[[10., 9.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[10., 9.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[10., 9.]]])
Above diag
has a shape of (3, 3, 2)
where the two columns are on the last axis. Notice how we didn't need to broadcast sum
.
What I wouldn't have done is: torch.eye(3).unsqueeze(0).repeat(len(sum), 1, 1)
. Since with this kind of shape - (2, 3, 3)
- you will have to use sum[:, None, None]
, and will need further broadcasting down the road...
The Jacobian is simply:
>>> J = (diag - x)/sum**2
tensor([[[ 0.0800, 0.0988],
[-0.0300, -0.0370],
[-0.0500, -0.0617]],
[[-0.0200, -0.0123],
[ 0.0700, 0.0741],
[-0.0500, -0.0617]],
[[-0.0200, -0.0123],
[-0.0300, -0.0370],
[ 0.0500, 0.0494]]])
You can check the results by backpropagating through the operation using an arbitrary dy
vector (not with torch.ones
though, you'll get 0
s because of J
!). After backpropagating, x.grad
should equal to torch.einsum('abc,bc->ac', J, dy)
.
这篇关于y = x/sum(x, dim=0) 的反向传播,其中张量 x 的大小为 (H,W)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!