问题描述
如何最好地通过基于Glue,基于EMR的Spark Jobs配置上述AWS Sagemaker ML模型端点?
How best can we have the said AWS Sagemaker ML model Endpoint configured via Glue, EMR based Spark Jobs ?
我们在AWS文档,创建了一个名为'linear-learner-2019-11-04-01-57-20-572'的端点,可以将其调用为
As we see in AWS Documentation 'here' , An End point names as 'linear-learner-2019-11-04-01-57-20-572' is created.It can be invoked as
response = client.invoke_endpoint(EndpointName='linear-learner-2019-11-04-01-57-20-572',
ContentType='text/csv',Body=values)
但是,假设我们有这样的批处理工作
- 在大数据上计划的批处理作业,从S3读取数据,
- 它经历了转换,添加了新列作为预测
- 结果存储为S3的输出。
- 可以每天触发一次,也可以在源文件夹中有新文件到达时触发
- scheduled batch job on a Big Data , Reads the data from a S3, where
- it undergo a transformation of adding a new column as prediction
- result Output stored as S3.
- Could be triggered on Daily basis, or On Arrival of a new file in source folder
我们如何最好地通过基于EMR的Glue的Spark Jobs配置上述端点?
推荐答案
您可以使用Amazon Step Functions创建操作的工作流程,并依次触发每个任务(EMR,Glue,Athena,SageMaker等)。关于批处理任务,我建议您考虑启动SageMaker Processing或SageMaker批处理推理作业
You can use Amazon Step Functions to create a workflow of actions and trigger each task one after the other (EMR, Glue, Athena, SageMaker, etc). Regarding batch tasks I recommend you consider launching a SageMaker Processing or SageMaker Batch Inference job
这篇关于将Amazon SageMaker端点集成到Glue或EMR上的批处理ETL工作流程中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!