问题描述
我对来自 Tensorflow hub 的 BERT 模型进行了微调,以构建一个简单的情感分析器.该模型训练和运行良好.在导出时,我只是使用了:
I fine-tuned a BERT model from Tensorflow hub to build a simple sentiment analyzer. The model trains and runs fine. On export, I simply used:
tf.saved_model.save(model, export_dir='models')
这工作得很好..直到我重新启动.
And this works just fine.. until I reboot.
重新启动后,模型不再加载.我曾尝试使用 Keras 加载器和 Tensorflow 服务器,但遇到了同样的错误.
On a reboot, the model no longer loads. I've tried using a Keras loader as well as the Tensorflow Server, and I get the same error.
我收到以下错误消息:
未找到:/tmp/tfhub_modules/09bd4e665682e6f03bc72fbcff7a68bf879910e/assets/vocab.txt;没有那个文件或目录
该模型正在尝试从 tfhub 模块缓存加载资产,该缓存因重启而被擦除.我知道我可以保留缓存,但我不想这样做,因为我希望能够生成模型,然后将它们复制到单独的应用程序中,而不必担心缓存.
The model is trying to load assets from the tfhub modules cache, which is wiped by reboot. I know I could persist the cache, but I don't want to do that because I want to be able to generate models and then copy them over to a separate application without worrying about the cache.
关键是我认为根本没有必要在缓存中查找资产.该模型与一个资产文件夹一起保存,其中生成了 vocab.txt
,因此为了找到资产,它只需要查看自己的资产文件夹(我认为).然而,它似乎并没有这样做.
The crux of it is that I don't think it's necessary to look in the cache for the assets at all. The model was saved with an assets folder wherein vocab.txt
was generated, so in order to find the assets it just needs to look in its own assets folder (I think). However, it doesn't seem to be doing that.
有什么办法可以改变这种行为吗?
Is there any way to change this behaviour?
添加了用于构建和导出模型的代码(这不是一个聪明的模型,只是对我的工作流程进行原型设计):
Added the code for building and exporting the model (it's not a clever model, just prototyping my workflow):
bert_model_name = "bert_en_uncased_L-12_H-768_A-12"
BATCH_SIZE = 64
EPOCHS = 1 # Initial
def build_bert_model(bert_model_name):
input_layer = tf.keras.layers.Input(shape=(), dtype=tf.string, name="inputs")
preprocessing_layer = hub.KerasLayer(
map_model_to_preprocess[bert_model_name], name="preprocessing"
)
encoder_inputs = preprocessing_layer(input_layer)
bert_model = hub.KerasLayer(
map_name_to_handle[bert_model_name], name="BERT_encoder"
)
outputs = bert_model(encoder_inputs)
net = outputs["pooled_output"]
net = tf.keras.layers.Dropout(0.1)(net)
net = tf.keras.layers.Dense(1, activation=None, name="classifier")(net)
return tf.keras.Model(input_layer, net)
def main():
train_ds, val_ds = load_sentiment140(batch_size=BATCH_SIZE, epochs=EPOCHS)
steps_per_epoch = tf.data.experimental.cardinality(train_ds).numpy()
init_lr = 3e-5
optimizer = tf.keras.optimizers.Adam(learning_rate=init_lr)
model = build_bert_model(bert_model_name)
model.compile(optimizer=optimizer, loss='mse', metrics='mse')
model.fit(train_ds, validation_data=val_ds, steps_per_epoch=steps_per_epoch)
tf.saved_model.save(model, export_dir='models')
推荐答案
这个问题来自一个 TensorFlow bug 由 https://tfhub.dev/tensorflow/bert_en_uncased_preprocess 的/1 和/2 版本触发.更新的模型 tensorflow/bert_*_preprocess/3(上周五发布)避免了这个错误.请更新到最新版本.
This problem comes from a TensorFlow bug triggered by versions /1 and /2 of https://tfhub.dev/tensorflow/bert_en_uncased_preprocess. The updated models tensorflow/bert_*_preprocess/3 (released last Friday) avoid this bug. Please update to the newest version.
使用 BERT 分类文本教程已相应更新.
感谢您提出这个问题!
这篇关于Tensorflow SavedModel 在加载时忽略资产文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!