本文介绍了tf.where 导致优化器在张量流中失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想看看我是否可以解决这个 tensorflow 而不是 pymc3 的问题.实验的想法是我将定义一个包含切换点的概率系统.我可以使用采样作为一种推理方法,但我开始想知道为什么我不能用梯度下降来代替.

I want to check if I can solve this problem with tensorflow instead of pymc3. The experimental idea is that I am going to define a probibalistic system that contains a switchpoint. I can use sampling as a method of inference but I started wondering why I couldn't just do this with a gradient descent instead.

我决定在 tensorflow 中进行梯度搜索,但是当涉及 tf.where 时,tensorflow 似乎很难执行梯度搜索.

I decided to do the gradient search in tensorflow but it seems like tensorflow is having a hard time performing a gradient search when tf.where is involved.

您可以在下面找到代码.

You can find the code below.

import tensorflow as tf
import numpy as np

x1 = np.random.randn(50)+1
x2 = np.random.randn(50)*2 + 5
x_all = np.hstack([x1, x2])
len_x = len(x_all)
time_all = np.arange(1, len_x + 1)

mu1 = tf.Variable(0, name="mu1", dtype=tf.float32)
mu2 = tf.Variable(5, name = "mu2", dtype=tf.float32)
sigma1 = tf.Variable(2, name = "sigma1", dtype=tf.float32)
sigma2 = tf.Variable(2, name = "sigma2", dtype=tf.float32)
tau = tf.Variable(10, name = "tau", dtype=tf.float32)

mu = tf.where(time_all < tau,
              tf.ones(shape=(len_x,), dtype=tf.float32) * mu1,
              tf.ones(shape=(len_x,), dtype=tf.float32) * mu2)
sigma = tf.where(time_all < tau,
              tf.ones(shape=(len_x,), dtype=tf.float32) * sigma1,
              tf.ones(shape=(len_x,), dtype=tf.float32) * sigma2)

likelihood_arr = tf.log(tf.sqrt(1/(2*np.pi*tf.pow(sigma, 2)))) -tf.pow(x_all - mu, 2)/(2*tf.pow(sigma, 2))
total_likelihood = tf.reduce_sum(likelihood_arr, name="total_likelihood")

optimizer = tf.train.RMSPropOptimizer(0.01)
opt_task = optimizer.minimize(-total_likelihood)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    print("these variables should be trainable: {}".format([_.name for _ in tf.trainable_variables()]))
    for step in range(10000):
        _lik, _ = sess.run([total_likelihood, opt_task])
        if step % 1000 == 0:
            variables = {_.name:_.eval() for _ in [mu1, mu2, sigma1, sigma2, tau]}
            print("step: {}, values: {}".format(str(step).zfill(4), variables))

您会注意到,即使 tensorflow 似乎知道变量及其梯度,tau 参数也不会改变.关于出了什么问题的任何线索?这是可以在 tensorflow 中计算的东西还是我需要不同的模式?

You'll notice that the tau parameter does not change even though tensorflow seems to be aware of the variable and it's gradient. Any clue on what is going wrong? Is this something that can be calculated in tensorflow or do I need a different pattern?

推荐答案

tau 仅用于 wherecondition 参数:(tf.where(time_all < tau, ...) ,这是一个布尔张量.由于计算梯度只对连续值有意义,输出的梯度相对于 tau 将为零.

tau is only used in the condition argument to where: (tf.where(time_all < tau, ...) , which is a boolean tensor. Since calculating gradients only makes sense for continuous values, the gradient of the output with respect to tau will be zero.

即使忽略 tf.where,您在表达式 time_all < 中使用了 tautau,几乎处处都是常数,因此梯度为零.

Even ignoring tf.where, you used tau in the expression time_all < tau, which is constant almost everywhere, so has a gradient of zero.

由于梯度为零,没有办法用梯度下降法学习tau.

Due to the gradient of zero, there is no way to learn tau with gradient descent methods.

根据您的问题,也许不是在两个值之间进行硬切换,您可以使用加权和代替 p*val1 + (1-p)*val2,其中 p 以连续的方式依赖于 tau.

Depending on your problem, maybe instead of a hard switch between two values, you can use a weighted sum instead p*val1 + (1-p)*val2, where p depends on tau in a continuous manner.

这篇关于tf.where 导致优化器在张量流中失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 04:32