问题描述
我正在使用Tensorflow概率中的混合多项式离散选择模型.该功能应从3个选择中选择一个输入.所选择的替代项由CHOSEN(#观测值x3张量)指定.以下是代码的更新,以反映我在问题上的进展(但问题仍然存在).
I am playing with a mixed multinomial discrete choice model in Tensorflow Probability. The function should take an input of a choice among 3 alternatives. The chosen alternative is specified by CHOSEN (a # observationsx3 tensor). Below is an update to the code to reflect my progress on the problem (but the problem remains).
我当前收到错误消息:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [6768,3] vs. [1,3,6768] [Op:Mul]
具有追溯性的问题表明问题出在联合扰动的最终组成部分的对log_prob()的调用中(即tfp.Independent(tfp.Multinomial(...))
with the traceback suggesting the issue is in the call to log_prob() for the final component of the joint distrubtion (i.e., tfp.Independent(tfp.Multinomial(...))
主要组成部分是(感谢Padarn Wilson帮助修正联合分布定义):
The main components are (thank you to Padarn Wilson for helping to fix the joint distribution definition):
@tf.function
def affine(x, kernel_diag, bias=tf.zeros([])):
"""`kernel_diag * x + bias` with broadcasting."""
kernel_diag = tf.ones_like(x) * kernel_diag
bias = tf.ones_like(x) * bias
return x * kernel_diag + bias
def mmnl_func():
adj_AV_train = (tf.ones(num_idx) - AV[:,0]) * tf.constant(-9999.)
adj_AV_SM = (tf.ones(num_idx) - AV[:,1]) * tf.constant(-9999.)
adj_AV_car = (tf.ones(num_idx) - AV[:,2]) * tf.constant(-9999.)
return tfd.JointDistributionSequential([
tfd.Normal(loc=0., scale=1e5), # mu_b_time
tfd.HalfCauchy(loc=0., scale=5), # sigma_b_time
lambda sigma_b_time,mu_b_time: tfd.MultivariateNormalDiag( # b_time
loc=affine(tf.ones([num_idx]), mu_b_time[..., tf.newaxis]),
scale_diag=sigma_b_time*tf.ones(num_idx)),
tfd.Normal(loc=0., scale=1e5), # a_train
tfd.Normal(loc=0., scale=1e5), # a_car
tfd.Normal(loc=0., scale=1e5), # b_cost
lambda b_cost,a_car,a_train,b_time: tfd.Independent(tfd.Multinomial(
total_count=1,
logits=tf.stack([
affine(DATA[:,0], tf.gather(b_time, IDX[:,0], axis=-1), (a_train + b_cost * DATA[:,1] + adj_AV_train)),
affine(DATA[:,2], tf.gather(b_time, IDX[:,0], axis=-1), (b_cost * DATA[:,3] + adj_AV_SM)),
affine(DATA[:,4], tf.gather(b_time, IDX[:,0], axis=-1), (a_car + b_cost * DATA[:,5] + adj_AV_car))
], axis=1)
),reinterpreted_batch_ndims=1)
])
@tf.function
def mmnl_log_prob(mu_b_time, sigma_b_time, b_time, a_train, a_car, b_cost):
return mmnl_func().log_prob(
[mu_b_time, sigma_b_time, b_time, a_train, a_car, b_cost, CHOICE])
# Based on https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args
# change constant values to tf.constant()
nuts_samples = tf.constant(1000)
nuts_burnin = tf.constant(500)
num_chains = tf.constant(1)
## Initial step size
init_step_size= tf.constant(0.3)
# Set the chain's start state.
initial_state = [
tf.zeros([num_chains], dtype=tf.float32, name="init_mu_b_time"),
tf.zeros([num_chains], dtype=tf.float32, name="init_sigma_b_time"),
tf.zeros([num_chains, num_idx], dtype=tf.float32, name="init_b_time"),
tf.zeros([num_chains], dtype=tf.float32, name="init_a_train"),
tf.zeros([num_chains], dtype=tf.float32, name="init_a_car"),
tf.zeros([num_chains], dtype=tf.float32, name="init_b_cost")
]
## NUTS (using inner step size averaging step)
##
@tf.function
def nuts_sampler(init):
nuts_kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=mmnl_log_prob,
step_size=init_step_size,
)
adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
inner_kernel=nuts_kernel,
num_adaptation_steps=nuts_burnin,
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
)
samples_nuts_, stats_nuts_ = tfp.mcmc.sample_chain(
num_results=nuts_samples,
current_state=initial_state,
kernel=adapt_nuts_kernel,
num_burnin_steps=tf.constant(100),
parallel_iterations=tf.constant(5))
return samples_nuts_, stats_nuts_
samples_nuts, stats_nuts = nuts_sampler(initial_state)
推荐答案
我能够从模型中获得合理的结果.谢谢大家的帮助!以下几点有助于解决各种问题.
I was able to get reasonable results from my model. Thank you to everyone for the help! The following points helped solve the various issues.
-
使用JointDistributionSequentialAutoBatched()来生成一致的批处理形状.您需要每晚安装tf才能访问.
Use of JointDistributionSequentialAutoBatched() to produce consistent batch shapes. You need tf-nightly installed for access.
更多有关超参数的先验知识. Multinomial()分布中的指数变换意味着无信息的超参数(即sigma = 1e5)意味着您很快就会有大量正数进入exp(),从而导致无穷大.
More informative priors for hyperparameters. The exponential transformation in the Multinomial() distribution means that uninformative hyperparameters (i.e., with sigma = 1e5) mean you quickly have large positive numbers entering the exp(), leading to infinities.
设置步长等也很重要.
我找到了 answer (由Christopher Suter提出)在Tensorflow概率论坛上最近提出的一个问题,该问题指定了STAN有用的模型.我利用从先前的样本中获取样本作为有用的初始似然参数的起点.
I found an answer by Christopher Suter to a recent question on the Tensorflow Probability forum specifying a model from STAN useful. I took the use of taking a sample from my prior as a starting point for the initial likelihood parameters useful.
尽管通过JointDistributionSequentialAutoBatched()纠正了批处理形状,但我还是回过头来纠正了我的联合分发形状,以便打印log_prob_parts()给出一致的形状(即[10,1]代表10个链).在不使用JointDistributionSequentialAutoBatched()的情况下,仍然出现形状错误,但是该组合似乎可行.
Despite JointDistributionSequentialAutoBatched() correcting the batch shapes, I went back and corrected my joint distribution shapes so that printing log_prob_parts() gives consistent shapes (i.e., [10,1] for 10 chains). I still get a shape error without using JointDistributionSequentialAutoBatched() but the combination seem to work.
我将affine()分为两个函数.它们执行相同的操作,但删除了追溯警告.基本上,affine()能够广播输入,但是它们有所不同,编写两个函数以使输入具有一致的形状更加容易.形状不同的输入会导致Tensorflow多次跟踪函数.
I separated my affine() into two functions. They do the same thing but remove retracing warnings. Basically, affine() was able to broadcast the inputs but they differed and it was easier to write two functions that setup the inputs with consistent shapes. Differently shaped inputs causes Tensorflow to trace the function multiple times.
这篇关于张量流概率中多项式模型的规范的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!