在Theano中定义关于张量的梯度

本文介绍了在Theano中定义关于张量的梯度的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

从概念上讲，我有一个关于Theano的简单问题，但我一直无法找到答案(尽管本教程花费了很多时间，但我还是坦白了自己并没有真正理解Theano中共享变量的工作原理). /p>

我正在尝试实现反卷积网络"；具体来说，我有3个张量的输入(每个输入是2D图像)和4个张量的代码；对于第i个输入代码，[i]代表一组代码字，它们一起对输入i进行编码.

在弄清楚如何对码字进行梯度下降时，我遇到了很多麻烦.这是我的代码的相关部分:

idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2) 

del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
    [codes, T.set_subtensor(codes[input_index], codes[input_index] - 
                            learning_rate*del_codes[input_index])     ]])

(这里的代码和字典是共享的张量变量). Theano对此不满意，特别是在定义上

del_codes = T.grad(loss, codes[idx])

我收到的错误消息是: theano.gradient.DisconnectedInputError:要求grad方法针对不属于成本计算图一部分或仅用于变量的变量计算梯度.由不可微运算符:Subtensor {int64} .0

我猜想它想要一个符号变量而不是code [idx];但是我不确定如何将所有东西连接起来以获得预期的效果.我猜我需要将最后一行更改为类似

learning_rate*del_codes)     ]])

有人可以给我一些有关如何正确定义此功能的指示吗?我想我可能缺少与Theano合作的基本知识，但我不确定是什么.

提前谢谢！

-贾斯汀

更新:Kyle的建议非常有效.这是我使用的特定代码

current_codes = T.tensor3('current_codes')
current_codes = codes[input_index]
pre_loss_conv = conv2d(input = current_codes.dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[input_index]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)  

del_codes = T.grad(loss, current_codes)
train_codes = function([input_index], loss)
train_dicts = theano.function([input_index], loss, updates = [[dicts, dicts - learning_rate*del_dicts]])
codes_update = ( codes, T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes) )
codes_update_fn = function([input_index], updates = [codes_update])

for i in xrange(num_inputs):
     current_loss = train_codes(i)
     codes_update_fn(i)

解决方案

总结调查结果:

分配grad_var = codes[idx]，然后创建一个新变量，例如:subgrad = T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes[input_index])

然后打电话train_codes = function([input_index], loss, updates = [[codes, subgrad]])

似乎可以解决问题.通常，我尝试为尽可能多的事情制作变量.有时在单个语句中尝试执行过多操作可能会引起棘手的问题，而且以后很难调试和理解！另外，在这种情况下，我认为theano需要一个共享变量，但是如果在需要共享变量的函数内 created 创建共享变量，则会遇到问题.

很高兴为您工作！

I have what is conceptually a simple question about Theano but I haven't been able to find the answer (I'll confess upfront to not really understanding how shared variables work in Theano, despite many hours with the tutorials).

I'm trying to implement a "deconvolutional network"; specifically I have a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes; for the ith input codes[i] represents a set of codewords which together code for input i.

I've been having a lot of trouble figuring out how to do gradient descent on the codewords. Here are the relevant parts of my code:

idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2) 

del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
    [codes, T.set_subtensor(codes[input_index], codes[input_index] - 
                            learning_rate*del_codes[input_index])     ]])

(here codes and dicts are shared tensor variables). Theano is unhappy with this, specifically with defining

del_codes = T.grad(loss, codes[idx])

The error message I'm getting is: theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0

I'm guessing that it wants a symbolic variable instead of codes[idx]; but then I'm not sure how to get everything connected to get the intended effect. I'm guessing I'll need to change the final line to something like

learning_rate*del_codes)     ]])

Can someone give me some pointers as to how to define this function properly? I think I'm probably missing something basic about working with Theano but I'm not sure what.

Thanks in advance!

-Justin

Update: Kyle's suggestion worked very nicely. Here's the specific code I used

current_codes = T.tensor3('current_codes')
current_codes = codes[input_index]
pre_loss_conv = conv2d(input = current_codes.dimshuffle('x', 0, 1,2),
                       filters = dicts.dimshuffle('x', 0,1, 2),
                       border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[input_index]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)  

del_codes = T.grad(loss, current_codes)
train_codes = function([input_index], loss)
train_dicts = theano.function([input_index], loss, updates = [[dicts, dicts - learning_rate*del_dicts]])
codes_update = ( codes, T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes) )
codes_update_fn = function([input_index], updates = [codes_update])

for i in xrange(num_inputs):
     current_loss = train_codes(i)
     codes_update_fn(i)

解决方案

To summarize the findings:

Assigning grad_var = codes[idx], then making a new variable such as: subgrad = T.set_subtensor(codes[input_index], codes[input_index] - learning_rate*del_codes[input_index])

Then calling train_codes = function([input_index], loss, updates = [[codes, subgrad]])

seemed to do the trick. In general, I try to make variables for as many things as possible. Sometimes tricky problems can arise from trying to do too much in a single statement, plus it is hard to debug and understand later! Also, in this case I think theano needs a shared variable, but has issues if the shared variable is created inside the function that requires it.

Glad this worked for you!

这篇关于在Theano中定义关于张量的梯度的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！