问题描述
我试图让我的星火流媒体应用程序从S3目录阅读他的投入,但我一直有推出后得到此异常火花提交脚本:
在线程异常主要java.lang.IllegalArgumentException:如果AWS访问密钥ID和秘密访问键必须被指定为S3N URL的用户名和密码(分别),或通过设置fs.s3n.awsAccessKeyId或fs.s3n.awsSecretAccessKey性质(分别)。
在org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
在org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49)
在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)
在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
在java.lang.reflect.Method.invoke(Method.java:606)
在org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
在org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
在org.apache.hadoop.fs.s3native。$ Proxy6.initialize(来源不明)
在org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:216)
在org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
在org.apache.hadoop.fs.FileSystem.access $ 200(FileSystem.java:66)
在org.apache.hadoop.fs.FileSystem $ Cache.get(FileSystem.java:1404)
在org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
在org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
在org.apache.spark.streaming.StreamingContext.checkpoint(StreamingContext.scala:195)
在MainClass $。主要(MainClass.scala:1190)
在MainClass.main(MainClass.scala)
在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)
在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
在java.lang.reflect.Method.invoke(Method.java:606)
在org.apache.spark.deploy.SparkSubmit $ .launch(SparkSubmit.scala:292)
在org.apache.spark.deploy.SparkSubmit $。主要(SparkSubmit.scala:55)
在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我通过code这个块这里http://spark.apache.org/docs/latest/ec2-scripts.html (页面底部):
VAL SSC =新org.apache.spark.streaming.StreamingContext(
CONF,
秒(60))
ssc.sparkContext.hadoopConfiguration.set(fs.s3n.awsAccessKeyId,ARGS(2))
ssc.sparkContext.hadoopConfiguration.set(fs.s3n.awsSecretAccessKey,ARGS(3))
的args(2)和args(3)是我的AWS访问密钥ID,当然还有秘密访问键。
为什么口口声声说他们没有设置?
编辑:我也尝试过这种方式,但我得到了同样的异常:
VAL线= ssc.textFileStream(S3N://+的args(2)+:+的args(3)+@< mybucket> /路径/ )
奇。也尝试对 sparkContext
做 .SET
。尝试还出口ENV变量在启动应用程序之前:
出口AWS_ACCESS_KEY_ID =<您的访问>
出口AWS_SECRET_ACCESS_KEY =<您的秘密>
^^这是我们如何做到这一点。
I'm trying to make my Spark Streaming application reading his input from a S3 directory but I keep getting this exception after launching it with spark-submit script:
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.fs.s3native.$Proxy6.initialize(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:216)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.spark.streaming.StreamingContext.checkpoint(StreamingContext.scala:195)
at MainClass$.main(MainClass.scala:1190)
at MainClass.main(MainClass.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I'm setting those variables through this block of code as suggested here http://spark.apache.org/docs/latest/ec2-scripts.html (bottom of the page):
val ssc = new org.apache.spark.streaming.StreamingContext(
conf,
Seconds(60))
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId",args(2))
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey",args(3))
args(2) and args(3) are my AWS Access Key ID and Secrete Access Key of course.
Why it keeps saying they are not set?
EDIT: I tried also this way but I get the same exception:
val lines = ssc.textFileStream("s3n://"+ args(2) +":"+ args(3) + "@<mybucket>/path/")
Odd. Try also doing a .set
on the sparkContext
. Try also exporting env variables before you start the application:
export AWS_ACCESS_KEY_ID=<your access>
export AWS_SECRET_ACCESS_KEY=<your secret>
^^this is how we do it.
这篇关于如何阅读从S3输入火花流EC2集群应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!