NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities 同时使用 spark 读取 s3 数据

由于我是 AWS 新手，我不知道我是否应该使用 s3:///、s3a:/// 或 s3n:///.我已经使用 aws-cli 设置了我的 AWS 凭证.我的机器上没有安装任何 Spark.预先感谢您的帮助解决方案我会先看看 S3A 故障排除文档不要尝试插入"比 Hadoop 版本构建的新版本的 AWS 开发工具包，无论您遇到什么问题，更改 AWS 开发工具包版本都不会解决问题，只会更改您看到的堆栈跟踪.无论您在本地 Spark 安装中使用什么版本的 hadoop-JAR，您都需要完全具有相同版本的 hadoop-aws，并且完全相同的版本构建 hadoop-aws 的 aws SDK.试试 mvnrepository 了解详情.I would like to run a simple spark job on my local dev machine (through Intellij) reading data from Amazon s3.my build.sbt file:scalaVersion := "2.11.12"libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "2.3.1", "org.apache.spark" %% "spark-sql" % "2.3.1", "com.amazonaws" % "aws-java-sdk" % "1.11.407", "org.apache.hadoop" % "hadoop-aws" % "3.1.1")my code snippet:val spark = SparkSession .builder .appName("test") .master("local[2]") .getOrCreate() spark .sparkContext .hadoopConfiguration .set("fs.s3n.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem") val schema_p = ... val df = spark .read .schema(schema_p) .parquet("s3a:///...")And I get the following exception:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2093) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2058) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2152) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2580) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:45) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:622) at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:606) at Test$.delayedEndpoint$Test$1(Test.scala:27) at Test$delayedInit$body.apply(Test.scala:4) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at Test$.main(Test.scala:4) at Test.main(Test.scala)Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.StreamCapabilities at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 41 moreWhen replacing s3a:/// to s3:/// I get another error: No FileSystem for scheme: s3As I am new to AWS, I do not know if I should user s3:///, s3a:/// or s3n:///. I have already setup my AWS credentials with aws-cli.I have not any Spark installation on my machine.Thanks in advance for your help 解决方案 I would start by looking at the S3A troubleshooting docs Do not attempt to "drop in" a newer version of the AWS SDK than that which the Hadoop version was built with Whatever problem you have, changing the AWS SDK version will not fix things, only change the stack traces you see.whatever version of the hadoop- JARs you have on your local spark installation, you need to have exactly the same version of hadoop-aws, and exactly the same version of the aws SDK which hadoop-aws was built with. Try mvnrepository for the details. 这篇关于NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities 同时使用 spark 读取 s3 数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！