问题描述
我试图用tf.RegisterGradient
和tf.gradient_override_map
编辑tf.stack
op的后向梯度计算机制,这是我的代码:
I was trying to edit tf.stack
op's backward gradient calculation mechanism with tf.RegisterGradient
andtf.gradient_override_map
, here are my codes:
import tensorflow as tf
class SynthGradBuilder(object):
def __init__(self):
self.num_calls = 0
def __call__(self, x, l=1.0):
op_name = "SynthGrad%d" % self.num_calls
@tf.RegisterGradient(op_name)
def _grad_synth(op, grad):
return grad[0]
g = tf.get_default_graph()
with g.gradient_override_map({"stack": op_name}):
y = tf.stack([x,x])
self.num_calls += 1
return y
GradSys = SynthGradBuilder()
在另一个脚本中,我写了
in another script, I wrote
import tensorflow as tf
from gradient_synthesizer import GradSys
x = tf.Variable([1,2])
y = GradSys(x, l=1)
z = tf.stack([x,x])
grad = tf.gradients(y, x, grad_ys=[[tf.convert_to_tensor([3, 4]),
tf.convert_to_tensor([6, 8])]])
grad_stack = tf.gradients(z, x, grad_ys=[[tf.convert_to_tensor([3, 4]),
tf.convert_to_tensor([6, 8])]])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print "grad bp: ", sess.run(grad)
print "grad_stack: ", sess.run(grad_stack)
print "y: ", sess.run(y)
预期的输出应该是:
grad bp: [3,4];
grad_stack: [3+6, 4+8] = [9, 12];
y: [[1,2], [1,2]];
我实际上从代码中得到的是:
What I actually got from the code was:
表明tf.stack
的后向梯度根本没有被替换,这与我的预期相反.
indicating that tf.stack
's backward gradients were not replaced at all, which was opposite to my expectation.
不知道是不是错误使用stack"作为操作tf.stack
的类型字符串导致的,我做了如下实验:
I'm not sure if such discrepancy was brought by falsely using "stack" as the type string of operation tf.stack
, I carried out an experiment in the following way:
描述张量 y 的第一项,stack:0"建议 op tf.stack
的注册名称是stack",这也是它的类型字符串.所以看起来这不是堆栈"的错.
The first item describing tensor y, the "stack:0" suggests op tf.stack
's registered name is "stack", which is also its type string. So it seems it is not "stack"'s fault.
我无法找出代码问题的原因.我想知道是否有人可以帮助我.
I am at a loss to figure out the causes of my codes' problem. I wonder if anyone can help me with that.
推荐答案
Tl;dr: 正确的代码应该是:
Tl;dr: The correct code should be:
@tf.RegisterGradient(op_name)
def _grad_synth(op, grad):
x, y = tf.unstack(grad)
return [x, tf.zeros_like(y)]
g = tf.get_default_graph()
with g.gradient_override_map({"Pack": op_name}):
y = tf.stack([x, x])
因为这是一个很常见的问题,所以我想解释一下更多细节:
Because this is a quite common question, I want to explain a little bit more details:
您的原始代码中有两个主要问题:
There are two main issues in your original code:
gradient_override_map
的错误用法:
- Wrong usage of
gradient_override_map
:
tf.stack
的实际 OP 名称是 Pack
(不是 Stack
),因此您需要覆盖 Pack
而不是 Stack
:
The actual OP name for tf.stack
is Pack
(not Stack
), so you need to ovrride Pack
instead of Stack
:
`g.gradient_override_map({"Pack": op_name})`.
您可能想知道我怎么知道实际的 OP 名称?好吧,一个简单的方法是通过运行以下代码来探测 GraphDef:
You may wonder how do I know the actual OP name? Well, a simple way is to prober the GraphDef by running the following code:
with tf.Graph().as_default():
x = tf.constant(0)
y = tf.stack([x, x])
print(tf.get_default_graph().as_graph_def())
- 错误的梯度函数:
Pack
的原始梯度是一个简单的 Unpack
(官方代码).在你的情况下,你仍然需要先解包梯度,但只传播第一部分:
The original gradients for Pack
is a simple Unpack
(official code). In your case, you still need to first unpack the gradients, but only propogate the FIRST part:
@tf.RegisterGradient(op_name)
def _grad_synth(op, grad):
x, y = tf.unstack(grad)
return [x, tf.zeros_like(y)]
请注意,此代码非常适合您的情况.但是,如果您想支持任意长度的堆栈,则可以使用稍微复杂一点的版本:
Note, this code works perfectly for your case. However, if you want to support any length of stack, you can use a slightly more complicated version:
@tf.RegisterGradient(op_name)
def _grad_synth(op, grad):
x_list = tf.unstack(grad)
for i in range(1, len(x_list)):
x_list[i] = tf.zeros_like(x_list[i])
return x_list
这篇关于Tensorflow:gradient_override_map 不能覆盖 op tf.stack 的后向梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!