问题描述
我正在使用Java Sagemaker SDK调用Sagemaker端点.我要发送的数据几乎不需要清理,模型就可以将其用于预测.我该如何在Sagemaker中做到这一点.
I am calling a Sagemaker endpoint using java Sagemaker SDK. The data that I am sending needs little cleaning before the model can use it for prediction. How can I do that in Sagemaker.
我在Jupyter笔记本实例中具有预处理功能,该功能正在清理训练数据,然后再传递该数据以训练模型.现在,我想知道是否可以在调用端点时使用该功能,或者该功能已被使用?如果有人愿意,我可以显示我的代码吗?
I have a pre-processing function in the Jupyter notebook instance which is cleaning the training data before passing that data to train the model. Now I want to know if I can use that function while calling the endpoint or is that function already being used?I can show my code if anyone wants?
编辑1 基本上,在预处理中,我正在执行标签编码.这是我的预处理功能
EDIT 1Basically, in the pre-processing, I am doing label encoding. Here is my function for preprocessing
def preprocess_data(data):
print("entering preprocess fn")
# convert document id & type to labels
le1 = preprocessing.LabelEncoder()
le1.fit(data["documentId"])
data["documentId"]=le1.transform(data["documentId"])
le2 = preprocessing.LabelEncoder()
le2.fit(data["documentType"])
data["documentType"]=le2.transform(data["documentType"])
print("exiting preprocess fn")
return data,le1,le2
这里的数据"是一个熊猫数据框.
Here the 'data' is a pandas dataframe.
现在我要在调用端点时使用这些le1,le2.我想在sagemaker本身而不是在Java代码中进行此预处理.
Now I want to use these le1,le2 at the time of calling endpoint. I want to do this preprocessing in sagemaker itself not in my java code.
推荐答案
SageMaker中现在有一个新功能,称为推理管道.这使您可以构建一个线性序列,包含两个到五个容器,用于预处理/后处理请求.然后将整个管道部署在单个端点上.
There is now a new feature in SageMaker, called inference pipelines. This lets you build a linear sequence of two to five containers that pre/post-process requests. The whole pipeline is then deployed on a single endpoint.
https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html
这篇关于在Sagemaker中进行预测之前,如何预处理输入数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!