我对AWS相对较新,这可能不是一个技术问题,但是目前AWS Glue指出最多允许创建25个作业。我们正在加载一系列表,每个表都有自己的工作,这些工作随后会附加审计列。每个作业都非常相似,但只是更改连接字符串的源和目标。

有没有一种方法可以对这些作业进行参数化以允许重复使用,并简单地将适当的连接字符串传递给它们?甚至可能遍历主作业中的一组设置的连接字符串,从而调用子作业来传递变化的连接字符串?

任何示例或文档将不胜感激

最佳答案

在下面的示例中,我介绍了如何在代码中使用Glue作业输入参数。此代码接受输入参数,并将其写入平面文件。

1)在作业配置中设置输入参数。

amazon-web-services - AWS Glue作业输入参数-LMLPHP

2)胶水作业代码

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
args = getResolvedOptions(sys.argv, ['JOB_NAME','VAL1','VAL2','VAL3','DEST_FOLDER'])
job.init(args['JOB_NAME'], args)

v_list=[{"VAL1":args['VAL1'],"VAL2":args['VAL2'],"VAL3":args['VAL3']}]

df=sc.parallelize(v_list).toDF()
df.repartition(1).write.mode('overwrite').format('csv').options(header=True, delimiter = ';').save("s3://"+ args['DEST_FOLDER'] +"/")

job.commit()


3)在使用boto3,CloudFormation或StepFunction的过程中也可以提供输入参数。本示例说明如何使用boto3做到这一点。

import boto3

def lambda_handler(event, context):
    glue = boto3.client('glue')


    myJob = glue.create_job(Name='example_job2', Role='AWSGlueServiceDefaultRole',
                            Command={'Name': 'glueetl','ScriptLocation': 's3://aws-glue-scripts/example_job'},
                            DefaultArguments={"VAL1":"value1","VAL2":"value2","VAL3":"value3"}
                                   )

    glue.start_job_run(JobName=myJob['Name'], Arguments={"VAL1":"value11","VAL2":"value22","VAL3":"value33"})


有用的链接:


https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job
https://docs.aws.amazon.com/step-functions/latest/dg/connectors-glue.html

关于amazon-web-services - AWS Glue作业输入参数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52316668/

10-11 10:45