本文介绍了使用S3作为fs.default.name或HDFS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我设立在EC2上Hadoop集群,我不知道该怎么办了DFS。我所有的数据,目前在S3中,所有的map / reduce应用程序使用S3文件路径来访问数据。现在,我一直在寻找在亚马逊电子病历是如何设置和似乎每个jobflow,一个名称节点和数据节点的设置。现在,我不知道我是否真的需要那样做,或者我可以只使用S3(N)为DFS?如果这样做,有什么缺点?

I'm setting up a Hadoop cluster on EC2 and I'm wondering how to do the DFS. All my data is currently in s3 and all map/reduce applications use s3 file paths to access the data. Now I've been looking at how Amazons EMR is setup and it appears that for each jobflow, a namenode and datanodes are setup. Now I'm wondering if I really need to do it that way or if I could just use s3(n) as the DFS? If doing so, are there any drawbacks?

谢谢!

推荐答案

为了使用S3代替HDFS fs.name.default核心-site.xml中需要指向你的水桶:

in order to use S3 instead of HDFS fs.name.default in core-site.xml needs to point to your bucket:

<property>
        <name>fs.default.name</name>
        <value>s3n://your-bucket-name</value>
</property>

我们推荐您使用S3N而不简单S3实现,因为S3N是readble任何其他应用程序,并通过自己:)

It's recommended that you use S3N and NOT simple S3 implementation, because S3N is readble by any other application and by yourself :)

此外,在同一核心site.xml文件,你需要指定以下属性:

Also, in the same core-site.xml file you need to specify the following properties:

  • fs.s3n.awsAccessKeyId
  • fs.s3n.awsSecretAccessKey

fs.s3n.awsSecretAccessKey

fs.s3n.awsSecretAccessKey

这篇关于使用S3作为fs.default.name或HDFS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-13 19:56