本文介绍了AWS Athena:“ msck修复表”会产生费用吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在S3中有ORC数据,如下所示:
I have ORC data in S3 that looks like this:
s3://bucket/orc/clientId=client-1/year=2017/month=3/day=16/hour=20/
s3://bucket/orc/clientId=client-2/year=2017/month=3/day=16/hour=21/
s3://bucket/orc/clientId=client-3/year=2017/month=3/day=16/hour=22/
每个小时,我运行一个EMR作业,它将S3中的原始JSON转换为ORC,并使用路径分区约定(如上)将其写出以供Athena接收。 EMR作业完成后,我运行 msck修复表
,以便Athena可以选择新分区。
Every hour I run an EMR job that converts raw JSON in S3 to ORC, and write it out with the path partition convention (above) for Athena ingestion. After the EMR job completes, I run msck repair table
so Athena can pick up the new partitions.
我有3个相关问题:
- 在这种情况下是否运行
msck修复表
在AWS中赚钱吗? -
msck修复表
可能会超时。我有办法在数据管道中迈出一步来继续运行此命令,直到命令成功完成? - 我希望手动将分区添加到Athena(因为我知道这一年,我正在工作的月,日,小时)。但是,我不知道
clientId
,因为它们可能是1-X,并且我不知道在运行EMR时存在哪些。是否有解决此问题的最佳实践方法(使用Hive或其他方法)?我可以进行s3 api调用以获取s3:// bucket / org /
的列表,并编写代码以遍历该列表并手动添加。我希望有一种更简单的方法...
- Does running
msck repair table
in this scenario, cost me money in AWS? - AWS Docs say
msck repair table
can timeout. Is there a way I can make a step in data pipeline to continue running this command until it completes successfully? - I would prefer to add the partitions manually to Athena (since I know the year,month,day,hour I'm working on). However I do not know the
clientId
because there could be 1-X of them, and I don't know which ones exist at time of running EMR. Is there a best practice way to solve this problem (using Hive or something else)? I could make an s3 api call to get a list ofs3://bucket/org/
and write code to iterate over list and add manually. I'm hoping there is an easier way...
注意:当我说手动添加分区时,我的意思是
Note: when I say "add partitions manually" I mean doing something like this:
ALTER TABLE <athena table>
ADD PARTITION (clientId='client-1',year=2017,month=3,day=16,hour=20)
location 's3://bucket/orc/clientId=client-1/year=2017/month=3/day=16/hour=20/';
推荐答案
:
:
我尚不知道如何自动执行 msck修复表
完成。
I do not yet know how to automate msck repair table
to make sure it completes.
这篇关于AWS Athena:“ msck修复表”会产生费用吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!