将Spark fileoutputcommitter.algorithm.version = 2与AWS Glue一起使用

本文介绍了将Spark fileoutputcommitter.algorithm.version = 2与AWS Glue一起使用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我无法弄清楚这一点，但是我试图在AWS Glue中使用直接输出提交者:

I haven't been able to figure this out, but I'm trying to use a direct output committer with AWS Glue:

spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2

是否可以将此配置与AWS Glue一起使用?

Is it possible to use this configuration with AWS Glue?

推荐答案

选项1:

Option 1 :

胶水使用spark上下文，您也可以将hadoop配置设置为aws胶水.因为内部动态框架是一种数据框架.

Glue uses spark context you can set hadoop configuration to aws glue as well. since internally dynamic frame is kind of dataframe.

sc._jsc.hadoopConfiguration().set("mykey","myvalue")

我认为您也需要像这样添加相应的课程

I think you neeed to add the correspodning class also like this

sc._jsc.hadoopConfiguration().set("mapred.output.committer.class", "org.apache.hadoop.mapred.FileOutputCommitter")

示例代码段:

 sc = SparkContext()

    sc._jsc.hadoopConfiguration().set("mapreduce.fileoutputcommitter.algorithm.version","2")

    glueContext = GlueContext(sc)
    spark = glueContext.spark_session

以证明该配置存在....

To prove that that configuration exists ....

在python中调试:

sc._conf.getAll() // print this

在scala中调试:

sc.getConf.getAll.foreach(println)

选项2:

Option 2:

您尝试使用胶水的工作参数的另一面:

Other side you try using job parameters of the glue :

https://docs.aws.amazon.com/glue/latest/dg/add-job.html 具有键值属性，如文档中所述

https://docs.aws.amazon.com/glue/latest/dg/add-job.htmlwhich has key value properties like mentioned in docs

'--myKey' : 'value-for-myKey'

您可以按照下面的屏幕截图编辑作业，并使用--conf

you can follow below screen shot for editing job and specifying the parameters with --conf

选项3:
如果您正在使用，aws cli可以在下面尝试... https://docs. aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Option 3:
If you are using, aws cli you can try below...https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

有趣，如下所示.但我不知道为什么它被暴露.

Fun is they mentioned in the docs dont set message like below. but i dont know why it was exposed.

这篇关于将Spark fileoutputcommitter.algorithm.version = 2与AWS Glue一起使用的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！