本文介绍了Tensorflow:gradient_override_map 不能覆盖 op tf.stack 的后向梯度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


我试图用tf.RegisterGradienttf.gradient_override_map编辑tf.stack op的后向梯度计算机制,这是我的代码:

I was trying to edit tf.stack op's backward gradient calculation mechanism with tf.RegisterGradientandtf.gradient_override_map, here are my codes:

import tensorflow as tf

class SynthGradBuilder(object):
    def __init__(self):
        self.num_calls = 0

    def __call__(self, x, l=1.0):
        op_name = "SynthGrad%d" % self.num_calls
        def _grad_synth(op, grad):
            return grad[0]

        g = tf.get_default_graph()
        with g.gradient_override_map({"stack": op_name}):
            y = tf.stack([x,x])

        self.num_calls += 1
        return y

GradSys = SynthGradBuilder()


in another script, I wrote

import tensorflow as tf
from gradient_synthesizer import GradSys

x = tf.Variable([1,2])
y = GradSys(x, l=1)
z = tf.stack([x,x])

grad = tf.gradients(y, x, grad_ys=[[tf.convert_to_tensor([3, 4]), 
                              tf.convert_to_tensor([6, 8])]])
grad_stack = tf.gradients(z, x, grad_ys=[[tf.convert_to_tensor([3, 4]), 
                              tf.convert_to_tensor([6, 8])]])

with tf.Session() as sess:

    print "grad bp: ", sess.run(grad)
    print "grad_stack: ", sess.run(grad_stack)
    print "y: ", sess.run(y)


grad bp: [3,4];
grad_stack: [3+6, 4+8] = [9, 12];
y: [[1,2], [1,2]];


What I actually got from the code was:


indicating that tf.stack's backward gradients were not replaced at all, which was opposite to my expectation.


I'm not sure if such discrepancy was brought by falsely using "stack" as the type string of operation tf.stack, I carried out an experiment in the following way:

描述张量 y 的第一项,stack:0"建议 op tf.stack 的注册名称是stack",这也是它的类型字符串.所以看起来这不是堆栈"的错.

The first item describing tensor y, the "stack:0" suggests op tf.stack 's registered name is "stack", which is also its type string. So it seems it is not "stack"'s fault.


I am at a loss to figure out the causes of my codes' problem. I wonder if anyone can help me with that.


Tl;dr: 正确的代码应该是:

Tl;dr: The correct code should be:

def _grad_synth(op, grad):
  x, y = tf.unstack(grad)
  return [x, tf.zeros_like(y)]

g = tf.get_default_graph()
with g.gradient_override_map({"Pack": op_name}):
  y = tf.stack([x, x])


Because this is a quite common question, I want to explain a little bit more details:


There are two main issues in your original code:

  1. gradient_override_map 的错误用法:
  1. Wrong usage of gradient_override_map:

tf.stack 的实际 OP 名称是 Pack(不是 Stack),因此您需要覆盖 Pack 而不是 Stack:

The actual OP name for tf.stack is Pack (not Stack), so you need to ovrride Pack instead of Stack:

`g.gradient_override_map({"Pack": op_name})`.

您可能想知道我怎么知道实际的 OP 名称?好吧,一个简单的方法是通过运行以下代码来探测 GraphDef:

You may wonder how do I know the actual OP name? Well, a simple way is to prober the GraphDef by running the following code:

with tf.Graph().as_default():
  x = tf.constant(0)
  y = tf.stack([x, x])
  1. 错误的梯度函数:

Pack 的原始梯度是一个简单的 Unpack (官方代码).在你的情况下,你仍然需要先解包梯度,但只传播第一部分:

The original gradients for Pack is a simple Unpack (official code). In your case, you still need to first unpack the gradients, but only propogate the FIRST part:

def _grad_synth(op, grad):
  x, y = tf.unstack(grad)
  return [x, tf.zeros_like(y)]


Note, this code works perfectly for your case. However, if you want to support any length of stack, you can use a slightly more complicated version:

def _grad_synth(op, grad):
  x_list = tf.unstack(grad)
  for i in range(1, len(x_list)):
    x_list[i] = tf.zeros_like(x_list[i])
  return x_list

这篇关于Tensorflow:gradient_override_map 不能覆盖 op tf.stack 的后向梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 01:59