问题描述
我正在编写一个程序,用于将数据上传到某些s3a://链接.该程序通过mvn install
编译.在本地运行程序(如使用java -jar jarfile.jar
一样)没有返回错误.但是,当我使用spark-submit(如使用spark-submit jarfile.jar
一样)时,它返回了这样的错误:
I am writing a program to upload a data to some s3a:// link. The program is compiled through mvn install
. Running the program locally (as in using java -jar jarfile.jar
) returned no error. However, when I use spark-submit (as in using spark-submit jarfile.jar
), it returned such error:
错误日志可以追溯到我的源代码的这一部分:
The error log traced to this portion of my source code:
sparkDataset
.write()
.format("parquet")
.mode(SaveMode.Overwrite)
.save("some s3a:// link");
其中sparkDataset
是org.apache.spark.sql.Dataset
的实例.
尝试如何从Apache Spark访问s3a://文件? 不成功,并返回了另一个错误,例如:
Trying How to access s3a:// files from Apache Spark? is unsuccessful and returned another error as such:
java.lang.NoSuchMethodError的问题:org.apache .hadoop.conf.Configuration.reloadExistingConfigurations()V 也不太可能,因为我可以在本地运行,而兼容性不是问题.
Problem from java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V is also unlikely because I can run locally, in which the compatilibity is not a problem.
此外,这些是我使用的相关库的版本:
In addition, these are the version of related libraries that I used:
- aws-java-sdk-bundle:1.11.199
- hadoop-aws:3.0.0
我希望通过s3a://链接写入文件.我认为依赖不是问题,因为我可以在本地运行.我只有在使用spark-submit来运行此程序时才遇到此问题.有人对如何解决这个问题有任何想法吗?
I am expecting files written through the s3a:// links. I think dependency is not the issue because I can run locally. I only face this problem when using spark-submit to run this program. Anyone have any ideas on how to resolve this?
此外,我已经检查了spark提交的spark版本是否据说是针对hadoop 2.7及更高版本构建的.我严格使用hadoop 3.0.0.难道这就是为什么我的程序中发生这种错误的线索?
In addition, I have checked that the spark version of the spark submit is said to be built for hadoop 2.7 and above. I am strictly using hadoop 3.0.0. Could this be a clue for why such error happened in my program?
推荐答案
从似乎可以指导我找到自己的解决方案.
Answer from Run spark-submit with my own build of hadoop had seem to guide me on finding my own solution.
基于我的理解,出于某些未知原因*,发行版"spark-2.4.0-bin-hadoop2.7.tgz"提供的spark-submit将排除在您的应用程序中一起编译的所有hadoop程序包
Based on my understanding, for some unknown reason*, the spark-submit provided by the distribution 'spark-2.4.0-bin-hadoop2.7.tgz' will exclude any packages of hadoop that is compiled together in your application.
之所以引发NoSuchMethodError
错误的原因是,直到Hadoop版本2.8.x,方法reloadExistingConfiguration
才存在.似乎在编写实木复合地板的过程中会以某种方式调用此特定方法.
The reason why was the NoSuchMethodError
error raised is because the method reloadExistingConfiguration
does not exist until Hadoop version 2.8.x. It seemed that writing a parquet would somehow invoke this particular method along the way.
我的解决方案是在将"spark-2.4.0-without-hadoop.tgz"连接到hadoop 3.0.0时使用单独的发行版,以便即使spark-submit排除了在执行期间将其打包到您的应用程序中.
My solution is to use the separate distribution of 'spark-2.4.0-without-hadoop.tgz' while connecting it to hadoop 3.0.0 so that it will use the correct version of hadoop even if spark-submit excluded the packages in your application during execution.
此外,由于无论如何都会通过spark-submit排除软件包,因此在通过Maven进行编译期间,我不会创建胖子.相反,我将在执行过程中使用标志--packages
来指定运行我的应用程序所需的依赖项.
In addition, since the packages would be excluded by spark-submit anyway, I would not create a fat jar during compilation through Maven. Instead, I would use the flag --packages
during execution to specify the dependencies that is required to run my application.
这篇关于使用spark-submit部署程序时出现java.lang.NoSuchMethodError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!