问题描述
我知道如何将模型加载到容器中,而且我知道我们可以创建一个静态配置文件,当我们运行 tensorflow 服务容器时,将其传递给容器,然后使用该配置文件中的模型,但我想知道是否有任何方法可以将全新模型(不是之前模型的更新版本)热加载到正在运行的 tensorflow 服务容器中.我的意思是我们用模型 A 运行容器,然后我们将模型 B 加载到容器中并使用它,我们可以这样做吗?如果是,如何?
I know how to load a model into a container and also I know that we can create a static config file and when we run a tensorflow serving container pass it to the container and later use one the models inside that config files but I want to know if there is any way to hot load a completely new model (not a newer version of the previous model) into a running tensorflow serving container. What I mean is we run the container with model-A and later we load model-B into the container and use it, can we do this? If yes how?
推荐答案
你可以.
首先您需要将新模型文件复制到您在启动 tf 服务时指定的 model_base_path
,以便服务器可以看到新模型.目录布局通常是这样的:$MODEL_BASE_PATH/$model_a/$version_a/* 和 $MODEL_BASE_PATH/$model_b/$version_b/*
First you need to copy the new model files to model_base_path
you specified when launching the tf serve, so that the server can see the new model. The directory layout is usually this: $MODEL_BASE_PATH/$model_a/$version_a/* and $MODEL_BASE_PATH/$model_b/$version_b/*
然后您需要使用包含新模型条目的新 model_config_file 刷新 tf serve.请参阅此处了解如何将条目添加到模型配置文件.要让服务器接受新配置,有两种方法:
Then you need to refresh the tf serve with a new model_config_file that includes the entry for the new model. See here on how to add entries to the model config file. To make the server take in the new config, there are two ways to do it:
- 保存新的配置文件并重新启动 tf 服务.
- 无需重新启动 tf 服务即可即时重新加载新模型配置.此服务在 model_service.proto 中定义为HandleReloadConfigRequest,但是服务的REST api好像不支持,所以需要依赖在 gRPC API 上.可悲的是,Python 客户端似乎是 unreferrer">Python 客户端我设法从 protobuf 文件生成 Java 客户端代码,但它非常复杂.此处的示例说明了如何生成用于执行 gRPC 推理的 Java 客户端代码,以及做 handleReloadConfigRequest() 非常相似.
- save the new config file and restart the tf serve.
- reload the new model config on the fly without restarting the tf serve. This service is defined in model_service.proto as HandleReloadConfigRequest, but the service's REST api does not seem to support it, so you need to rely on the gRPC API. Sadly the Python client for gRPC seems unimplemented. I managed to generate Java client code from protobuf files, but it is quite complex. An example here explains how to generate Java client code for doing gRPC inferencing, and doing handleReloadConfigRequest() is very similar.
这篇关于将模型热加载到 tensorflow 服务容器中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!