问题描述
在集群模式(yarn-cluster)中使用Spark-submit时,jar和包的配置使我感到困惑:对于jar,我可以将它们放在HDFS中,而不是放在本地目录中.但是对于软件包来说,因为它们是使用Maven和HDFS构建的,所以它不起作用.我的方式如下:
When use Spark-submit in cluster mode(yarn-cluster),jars and packages configuration confused me: for jars, i can put them in HDFS, instead of in local directory . But for packages, because they build with Maven, with HDFS,it can't work. my way like below:
spark-submit --jars hdfs:///mysql-connector-java-5.1.39-bin.jar --driver-class-path /home/liac/test/mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar --conf "spark.mongodb.input.uri=mongodb://192.168.27.234/test.myCollection2?readPreference=primaryPreferred" --conf "spark.mongodb.output.uri=mongodb://192.168.27.234/test.myCollection2" --packages com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0 --py-files /home/liac/code/diagnose_disease/tool.zip main_disease_tag_spark.py --master yarn-client
发生错误:
`Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Provided Maven Coordinates must be in the form 'groupId:artifactId:version'. The coordinate provided is: com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0
任何人都可以告诉我如何在集群模式下使用jar和包吗?我的方式怎么了?
Anyone can tell me how to use jars and packages in cluster mode? and what's wrong with my way?
推荐答案
您对--packages
参数的使用是错误的:
Your use of the --packages
argument is wrong:
--packages com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0
如输出所示,它必须采用groupId:artifactId:version
的形式.您不能将其与URL一起使用.
It needs to be in the form of groupId:artifactId:version
as the output suggests. You cannot use a URL with it.
将mongoDB与spark配合使用的示例,其中内置了存储库支持:
An example for using mongoDB with spark with the built-in repository support:
$SPARK_HOME/bin/spark-shell --packages org.mongodb.spark:mongo-spark-connector_2.11:1.0.0
如果您坚持使用自己的jar,则可以通过--repositories
提供它.参数的值是
If you insist on using your own jar you can provide it via --repositories
. The value of the argument is
例如,在您的情况下,可能是
For example, in your case, it could be
--repositories hdfs:///user/liac/package/jars/ --packages org.mongodb.spark:mongo-spark-connector_2.11:1.0.0
这篇关于如何在集群模式下使用Spark提交配置:jars,packages :?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!