如何向 Sagemaker 端点添加运行状况检查?

本文介绍了如何向 Sagemaker 端点添加运行状况检查?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的 sagemaker 端点有一个/ping，根据 AWS Cloudwatch，它大约每 5 秒被 ping 一次:

My sagemaker endpoint has a /ping and according to AWS Cloudwatch it gets pinged about every 5 seconds:

10.32.0.1 - - [01/Feb/2018:08:08:35 +0000] "GET /ping HTTP/1.1" 200 1 "-" "AHC/2.0"

但是，我不知道如果这个 ping 失败会发生什么.在哪里可以配置健康检查?

However, I don't see what would happen if this ping would fail. Where can I configure the health check?

推荐答案

如果在 Endpoint 创建过程中 ping 一直失败，我们会将容器视为不健康并且使 Endpoint 失败并显示错误消息:

If the pings fail consistently during Endpoint creation, we will treat the container as unhealthy and fail the Endpoint with an error message:

ClientError:生产变体 [xxx] 的主容器未通过 ping 运行状况检查.请检查此端点的 CloudWatch 日志."

"ClientError: The primary container for production variant [xxx] did not pass the ping health check. Please check CloudWatch logs for this endpoint."

如果端点创建后 ping 始终失败(端点已启动并正在运行)，我们将尽力更换实例，同时保持您的端点服务.

If the pings fail consistently after Endpoint creation (Endpoint is up and running), we will try our best to replace the instance while keeping your Endpoint in service.

这是文档页面:https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests

您可以实施更复杂的健康检查.但是 ping 响应应在 2 秒超时内返回.

You can implement more sophisticated health check. However the ping response should return within 2 seconds timeout.

希望这会有所帮助！

-韩

这篇关于如何向 Sagemaker 端点添加运行状况检查?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！