问题描述
我的模型太大,无法使用普通的v2 TPU设备获得大于64的批处理.在问题排查网站上,提到即将到来的tensorflow版本将包含bfloat16支持.新支持的tf版本1.9-1.12现在是否可以使用bfloat16,如果可以,我可以使用的优化器数量有限吗?我没有找到任何进一步的文档,但是在tensor2tensor模型中看到了bfloat16的用法,所以我想一定有办法.
My model is too big to get a batch >64 with the normal v2 TPU devices. On the troubleshooting site it is mentioned that upcoming tensorflow versions will have bfloat16 support. Are the newly supported tf versions 1.9-1.12 capable to use bfloat16 now and if yes, is there a limited set of optimizers I can use? I did not find any further documentation on this but saw the usage of bfloat16 in the tensor2tensor model, so I guess there must be a way.
此外,我了解到 TPU v3也支持更大的模型但是该模型将需要进行最小的更改,但是我找不到任何需要更改的文档.
Furthermore I read that TPU v3 supports bigger models as well but that the model would need minimal changes, but I don't find any documentation what needs to be changed.
我已经在使用 Adafactor 和尝试缩小我的图层,如果您有任何其他的缩小技巧,那也很好.我使用图片矩阵和字向量(截至目前为float32)作为输入.
I'm already using Adafactor and tried to reduce my layers, if you have any further reduction tips, that would be great too. I'm using picture matrices and word vectors (float32 as of now) as input.
推荐答案
您可以将bfloat16
与TPU一起使用.有两件事要做:
You can use bfloat16
with TPUs. There are two main things to do:
- 在输入管道中将输入投射到bfloat16
- 在bfloat16范围内环绕网络,并将输出转换为F32以进行进一步的计算.
下面是说明必要更改的代码段:
Here is a code snippet that illustrates the necessary changes:
def input_fn():
def dataset_parser(self, value):
"""Parse an ImageNet record from a serialized string Tensor."""
image = self.image_preprocessing_fn(
image_bytes=image_bytes,
is_training=self.is_training,
)
if self.use_bfloat16:
image = tf.cast(image, tf.bfloat16)
return image, label
def resnet_model_fn(features, labels, mode, params):
"""The model_fn for ResNet to be used with TPUEstimator."""
# This nested function allows us to avoid duplicating the logic which
# builds the network, for different values of --precision.
def build_network():
network = resnet_model.resnet_v1(
resnet_depth=FLAGS.resnet_depth,
num_classes=LABEL_CLASSES,
data_format=FLAGS.data_format)
return network(
inputs=features, is_training=(mode == tf.estimator.ModeKeys.TRAIN))
if FLAGS.precision == 'bfloat16':
with bfloat16.bfloat16_scope():
logits = build_network()
logits = tf.cast(logits, tf.float32)
elif FLAGS.precision == 'float32':
logits = build_network()
您还可以看到此TPU模型.
这篇关于减少内存的Tensorflow TPU v2/v3 bfloat16的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!