问题描述
默认情况下,Tensorflow 变量在 float32 中.为了节省内存,我试图在 float16 中运行.在我的图表中,我可以将数据类型定义为 float16 的每个地方,我都做到了.但是,当我运行代码时出现错误
By default, the variables Tensorflow is in float32. To save memory, I'm trying to run in float16. In my graph, every place where I could define the datatype as float16, I did. However, I get an error when I run the code
下面是我的代码.
import math
import numpy as np
import tensorflow as tf
vocabulary_size = 10
batch_size = 64
embedding_size = 100
num_inputs =4
num_sampled = 128
graph = tf.Graph()
with graph.as_default(): #took out " , tf.device('/cpu:0')"
train_dataset = tf.placeholder(tf.int32, shape=[batch_size, num_inputs ])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
embeddings = tf.get_variable( 'embeddings', dtype=tf.float16,
initializer= tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0, dtype=tf.float16) )
softmax_weights = tf.get_variable( 'softmax_weights', dtype=tf.float16,
initializer= tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size), dtype=tf.float16 ) )
softmax_biases = tf.get_variable('softmax_biases', dtype=tf.float16,
initializer= tf.zeros([vocabulary_size], dtype=tf.float16), trainable=False )
embed = tf.nn.embedding_lookup(embeddings, train_dataset) #train data set is
embed_reshaped = tf.reshape( embed, [batch_size*num_inputs, embedding_size] )
segments= np.arange(batch_size).repeat(num_inputs)
averaged_embeds = tf.segment_mean(embed_reshaped, segments, name=None)
sam_sof_los = tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=averaged_embeds,
labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size)
loss = tf.reduce_mean( sam_sof_los )
optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
saver = tf.train.Saver()
这是错误信息
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
509 as_ref=input_arg.is_ref,
--> 510 preferred_dtype=default_dtype)
511 except TypeError as err:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx)
1143 if ret is None:
-> 1144 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1145
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in _TensorTensorConversionFunction(t, dtype, name, as_ref)
980 "Tensor conversion requested dtype %s for Tensor with dtype %s: %r" %
--> 981 (dtype.name, t.dtype.name, str(t)))
982 return t
ValueError: Tensor conversion requested dtype float16 for Tensor with dtype float32: 'Tensor("sampled_softmax_loss/Log:0", shape=(64, 1), dtype=float32)'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-2-12d508b9e5d7> in <module>()
46
47 sam_sof_los = tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=averaged_embeds,
---> 48 labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size)
49
50 loss = tf.reduce_mean( sam_sof_los )
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py in sampled_softmax_loss(weights, biases, labels, inputs, num_sampled, num_classes, num_true, sampled_values, remove_accidental_hits, partition_strategy, name, seed)
1347 partition_strategy=partition_strategy,
1348 name=name,
-> 1349 seed=seed)
1350 labels = array_ops.stop_gradient(labels, name="labels_stop_gradient")
1351 sampled_losses = nn_ops.softmax_cross_entropy_with_logits_v2(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py in _compute_sampled_logits(weights, biases, labels, inputs, num_sampled, num_classes, num_true, sampled_values, subtract_log_q, remove_accidental_hits, partition_strategy, name, seed)
1126 if subtract_log_q:
1127 # Subtract log of Q(l), prior probability that l appears in sampled.
-> 1128 true_logits -= math_ops.log(true_expected_count)
1129 sampled_logits -= math_ops.log(sampled_expected_count)
1130
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y)
860 with ops.name_scope(None, op_name, [x, y]) as name:
861 if isinstance(x, ops.Tensor) and isinstance(y, ops.Tensor):
--> 862 return func(x, y, name=name)
863 elif not isinstance(y, sparse_tensor.SparseTensor):
864 try:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py in sub(x, y, name)
8316 if _ctx is None or not _ctx._eager_context.is_eager:
8317 _, _, _op = _op_def_lib._apply_op_helper(
-> 8318 "Sub", x=x, y=y, name=name)
8319 _result = _op.outputs[:]
8320 _inputs_flat = _op.inputs
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
544 "%s type %s of argument '%s'." %
545 (prefix, dtypes.as_dtype(attrs[input_arg.type_attr]).name,
--> 546 inferred_from[input_arg.type_attr]))
547
548 types = [values.dtype]
TypeError: Input 'y' of 'Sub' Op has type float32 that does not match type float16 of argument 'x'.
错误来自 tf.nn.sampled_softmax_loss
行.
起初我认为 tf.segment_mean 可能会将输出转换为 float32,所以我尝试将 averaged_embeds 转换为 float16,但我仍然遇到相同的错误.
At first I thought perhaps tf.segment_mean may cast the output as a float32, so I tried casting averaged_embeds to float16 but I still get the same error.
从文档来看,似乎没有办法在 sampled_softmax_loss 中定义任何数据类型
From the documentation, there doesn't seem to be a way to define any data types in sampled_softmax_loss
https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss
推荐答案
据我所知,您只能使用 hack 来完成.
As far as I can tell, you can only do it using a hack.
问题来自调用:
if sampled_values is None:
sampled_values = candidate_sampling_ops.log_uniform_candidate_sampler(
true_classes=labels,
num_true=num_true,
num_sampled=num_sampled,
unique=True,
range_max=num_classes,
seed=seed)
输出这种类型的对象:
LogUniformCandidateSampler(
sampled_candidates=<tf.Tensor 'LogUniformCandidateSampler:0' shape=(128,) dtype=int64>,
true_expected_count=<tf.Tensor 'LogUniformCandidateSampler:1' shape=(64, 1) dtype=float32>,
sampled_expected_count=<tf.Tensor 'LogUniformCandidateSampler:2' shape=(128,) dtype=float32>
)
技巧是自己生成LogUniformCandidateSampler
,将其结果转换为tf.float16
并将其传递给tf.nn.sampled_softmax_loss
.
The hack would be to generate yourself the LogUniformCandidateSampler
, to cast its result as tf.float16
and pass it to tf.nn.sampled_softmax_loss
.
# Redefine it as the tensorflow one is not exposed.
LogUniformCandidateSampler = namedtuple("namedtuple", ["sampled_candidates", "true_expected_count", "sampled_expected_count"])
sampled_values = tf.nn.log_uniform_candidate_sampler(
true_classes=tf.cast(train_labels, tf.int64), num_sampled=num_sampled,
num_true=1,
unique=True,
range_max=vocabulary_size,
seed=None)
sampled_value_16 = LogUniformCandidateSampler(
sampled_values.sampled_candidates,
tf.cast(sampled_values.true_expected_count, tf.float16),
tf.cast(sampled_values.sampled_expected_count, tf.float16))
sam_sof_los = tf.nn.sampled_softmax_loss(
weights=softmax_weights,
biases=softmax_biases,
inputs=averaged_embeds,
labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size,
sampled_values=sampled_value_16)
但这确实是一种黑客行为,它可能会产生意想不到的后果(预期的结果是 tf.cast
操作是不可微的).
But this is really a hack and it might have unexpected consequences (an expected one would be that the tf.cast
operation is not differentiable).
这篇关于如果所有变量都在 float16 而不是 float32 中,如何运行定义 Tensorflow 图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!