问题描述
我正在整理一个EMR群集,并创建了在EMR文档中,但是我应该如何上传数据并从中读取数据?在我的spark提交步骤中,我说使用s3://myclusterbucket/scripts/script.py
的脚本名称是否输出不会自动上传到s3?如何处理依赖关系?我尝试使用指向s3存储桶中的依赖项zip的pyfiles,但始终返回找不到文件"
I'm spinning up an EMR cluster and I've created the buckets specified in the EMR docs, but how should I upload data and read from it? In my spark submit step I say the script name using s3://myclusterbucket/scripts/script.py
Is output not automatically uploaded to s3? How are dependencies handled? I've tried using the pyfiles pointing to a dependency zip inside the s3 bucket, but keep getting back 'file not found'
推荐答案
由于EMRFS(基于S3的AWS专有Hadoop文件系统实现),EMR中的MapReduce或Tez作业可以直接访问S3,例如,在Apache Pig中,您可以执行loaded_data = LOAD 's3://mybucket/myfile.txt' USING PigStorage();
MapReduce or Tez jobs in EMR can access S3 directly because of EMRFS (an AWS propriertary Hadoop filesystem implementation based on S3), e.g., in Apache Pig you can doloaded_data = LOAD 's3://mybucket/myfile.txt' USING PigStorage();
不确定基于Python的Spark作业.但是一种解决方案是先将对象从S3复制到EMR HDFS,然后在此处进行处理.
Not sure about Python-based Spark jobs. But one solution is to first copy the objects from S3 to the EMR HDFS, and then process them there.
有多种复制方法:
-
使用
hadoop fs
命令将对象从S3复制到EMR HDFS(反之亦然),例如hadoop fs -cp s3://mybucket/myobject hdfs://mypath_on_emr_hdfs
Use
hadoop fs
commands to copy objects from S3 to the EMR HDFS (and vice versa), e.g.,hadoop fs -cp s3://mybucket/myobject hdfs://mypath_on_emr_hdfs
使用s3-dist-cp将对象从S3复制到EMR HDFS(反之亦然) http://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html
Use s3-dist-cp to copy objects from S3 to the EMR HDFS (and vice versa) http://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html
您还可以使用awscli(或hadoop fs -copyToLocal
)将对象从S3复制到EMR主实例本地磁盘(反之亦然),例如aws s3 cp s3://mybucket/myobject .
You can also use awscli (or hadoop fs -copyToLocal
) to copy objects from S3 to the EMR master instance local disk (and vice versa), e.g., aws s3 cp s3://mybucket/myobject .
这篇关于EMR如何处理输入和输出的s3存储桶?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!