问题描述
我正在尝试使用EMRFS(S3存储桶)作为存储来创建EMR 5.3.0.请为此提供一般指导.
I'm trying to create EMR 5.3.0 with EMRFS (S3 bucket) as storage.Please provide your general guidance regarding this.
当前我正在使用以下命令创建InstanceType = m4.2xlarge的EMR 5.3.0.虽然工作正常,但是使用EMRFS作为存储却无法做到
Currently i'm using below command for creating EMR 5.3.0 with InstanceType=m4.2xlarge.Which is working fine, but with EMRFS as storage i'm not able to do
aws emr create-cluster --name "DEMAPAUR001" --release-label emr-5.3.0 --service-role EMR_DefaultRole_Private --enable-debug --log-uri 's3n://xyz/trn' --ec2-attributes SubnetId=subnet-545e8823, KeyName=XXX --applications Name=Hbase Name=Hive Name=Pig Name=Ganglia --configurations '[{"Classification":"hdfs-site","Properties": {"dfs.replication":"2"},"Configurations":[]}]' --instance-groups
'InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.2xlarge, EbsConfiguration={EbsOptimized=true, EbsBlockDeviceConfigs=[{VolumeSpecification= {VolumeType=io1,SizeInGB=500,Iops=200},VolumesPerInstance=1}]}' 'InstanceGroupType=CORE, InstanceCount=1,InstanceType=m4.2xlarge,EbsConfiguration={EbsOptimized=true, EbsBlockDeviceConfigs=[{VolumeSpecification={VolumeType=io1,SizeInGB=500,Iops=200},VolumesPerInstance=1}]}' --tags Name=DEMAPAUR001 Owner="XXX" Division=Corporate Application=DEM-EMR Environment=TRN CostCenter=XXX123 CreatedBy=XXX ManagedBy=XXX Availability=24x7_Mon-Fri Backup=NA
aws emr create-cluster --name "DEMAPAUR001" --release-label emr-5.3.0 --service-role EMR_DefaultRole_Private --enable-debug --log-uri 's3n://xyz/trn' --ec2-attributes SubnetId=subnet-545e8823, KeyName=XXX --applications Name=Hbase Name=Hive Name=Pig Name=Ganglia --configurations '[{"Classification":"hdfs-site","Properties": {"dfs.replication":"2"},"Configurations":[]}]' --instance-groups
'InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.2xlarge, EbsConfiguration={EbsOptimized=true, EbsBlockDeviceConfigs=[{VolumeSpecification= {VolumeType=io1,SizeInGB=500,Iops=200},VolumesPerInstance=1}]}' 'InstanceGroupType=CORE, InstanceCount=1,InstanceType=m4.2xlarge,EbsConfiguration={EbsOptimized=true, EbsBlockDeviceConfigs=[{VolumeSpecification={VolumeType=io1,SizeInGB=500,Iops=200},VolumesPerInstance=1}]}' --tags Name=DEMAPAUR001 Owner="XXX" Division=Corporate Application=DEM-EMR Environment=TRN CostCenter=XXX123 CreatedBy=XXX ManagedBy=XXX Availability=24x7_Mon-Fri Backup=NA
请帮助我.
推荐答案
在启动集群时,可以在配置中使用以下分类.
You can use the following classification in the configuration while launching the cluster.
用于启用一致视图
{ 分类":"emrfs-site", 特性": { "fs.s3.consistent":"true" } }
{ "Classification": "emrfs-site", "Properties": { "fs.s3.consistent": "true" } }
此外,如果您实际上想让hive指向S3并将所有新文件存储在那里,则必须将此分类添加到hive-site.xml
Also, if you actually want hive to point to S3 and store all new files there, you will have to add this classification to hive-site.xml
{ 分类":蜂巢站点", 特性": { "hive.metastore.warehouse.dir":self.hive_warehouse_dir } }
{ "Classification": "hive-site", "Properties": { "hive.metastore.warehouse.dir": self.hive_warehouse_dir } }
这篇关于使用EMRFS(s3存储桶)创建EMR 5.3.0作为存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!