我编写了一个 Spark 代码,该代码复制了一个文件夹并将其放入Amazon S3独立存储桶中。该过程运行正常,但现在我正尝试将相同的过程与在Scality上运行的Amazon S3存储桶一起使用。这是我的配置。

spark-submit --name "Backup S3 Test" --master yarn-cluster  --executor-memory 2048m --num-executors 6 --executor-cores 2 --driver-memory 1024m --keytab /home/bigdata/userbcks3.keytab
--principal XXXXXXX@XXXXXXXX
--deploy-mode cluster
--conf spark.file.replicate.exclusion.regexps=""
--conf spark.hadoop.fs.s3a.access.key=XXXXXXXXXX
--conf spark.hadoop.fs.s3a.secret.key=XXXXXXXXXX
--class com.keedio.hadoop.FileReplicator hdfs-file-processors-1.1.6-SNAPSHOT.jar /pre/mydata/ s3a://mybucket/

现在异常(exception)
om.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)

        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)

        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)

        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)

        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4221)

        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4168)

        at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1306)

        at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1263)

        at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:323)

        ... 20 more

Caused by: com.amazonaws.SdkClientException: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/

        at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:115)

        at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:77)

        at com.amazonaws.auth.InstanceProfileCredentialsProvider$InstanceMetadataCredentialsEndpointProvider.getCredentialsEndpoint(InstanceProfileCredentialsProvider.java:156)

        at com.amazonaws.auth.EC2CredentialsFetcher.fetchCredentials(EC2CredentialsFetcher.java:121)

        at com.amazonaws.auth.EC2CredentialsFetcher.getCredentials(EC2CredentialsFetcher.java:82)

        at com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:141)

        at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:129)

要进行复制,我只是使用apache FileUtils,它使我可以在DistributedFileSystem和S3AFileSystem之间移动文件。
有什么办法可以使它在同一过程中工作?也许我缺少任何配置参数?

最佳答案

无论您运行的是什么程序,都无法通过其他身份验证选项(env vars,EC2元数据服务器)运行fs.s3a.access.key / secret.key值,并且失败。您还没有与远端进行通讯。

如果您以前编写过代码,并且该代码已在EC2中运行,则可能一直是登录您的元数据服务器...

关于apache-spark - Scality中的Amazon S3是否支持S3AFileSystem与hadoop交互?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53298455/

10-15 20:03