问题描述
当我将Hadoop集群连接到Amazon存储并将文件下载到HDFS时,发现 s3:// 不起作用。当在互联网上寻找一些帮助时,我发现我可以使用 S3n 。当我使用 S3n 时,它工作。我不明白使用 S3 和 S3n 与我的Hadoop集群之间的区别,有人可以解释吗?
When I connected my Hadoop cluster to Amazon storage and downloaded files to HDFS, I found s3:// did not work. When looking for some help on the Internet I found I can use S3n. When I used S3n it worked. I do not understand the differences between using S3 and S3n with my Hadoop cluster, can someone explain?
推荐答案
我认为您的主要问题与 S3 和 S3n 作为Hadoop的两个独立连接点。 s3n:// 表示一个常规文件,可从外部世界读取,在这个S3 url。 s3:// 指映射到位于AWS存储群集中的S3存储桶的HDFS文件系统。因此,当您使用Amazon存储桶中的文件时,您必须使用S3N,这就是您的问题得到解决的原因。 @Steffen添加的信息也很棒!
I think your main problem was related with having S3 and S3n as two separate connection points for Hadoop. s3n:// means "A regular file, readable from the outside world, at this S3 url". s3:// refers to an HDFS file system mapped into an S3 bucket which is sitting on AWS storage cluster. So when you were using a file from Amazon storage bucket you must be using S3N and that's why your problem is resolved. The information added by @Steffen is also great!!
这篇关于Hadoop中的Amazon S3和S3n之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!