本文介绍了使用SQL代码向athena中的现有表添加新的分区方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能在Athena中将分区添加到当前没有分区的现有表中?如果是这样,还请在答案中写出这样做的语法.

Is it even possible to add a partition to an existing table in Athena that currently is without partitions? If so, please also write syntax for doing so in the answer.

例如:

ALTER TABLE table1 ADD PARTITION (ourDateStringCol = '2021-01-01')

上面的命令将给出以下错误:

The above command will give the following error:

失败:未对SemanticException表进行分区,但存在分区规范

FAILED: SemanticException table is not partitioned but partition spec exists

注意:我已经进行了一次网络搜索,并且存在SQL Server的变体,或者在已经分区表中添加了一个分区.但是,我个人找不到能成功将分区添加到现有未分区表中的情况.

Note: I have done a web-search, and variants exist for SQL server, or adding a partition to an already partitioned table. However, I personally could not find a case where one could successfully add a partition to an existing non-partitioned table.

这非常类似于: SemanticException添加分区Hive表

但是,那里给出的答案需要重新创建表.

However, the answer given there requires re-creating the table.

我想这样做,而无需重新创建表格 e.

推荐答案

Athena中的分区基于S3中的文件夹结构.与将数据加载到其磁盘或内存中的标准RDBMS不同,Athena基于S3中的扫描数据.这就是您享受服务的规模和低成本的方式.

Partitions in Athena are based on folder structure in S3. Unlike standard RDBMS that are loading the data into their disks or memory, Athena is based on scanning data in S3. This is how you enjoy the scale and low cost of the service.

这意味着您必须将数据放在有意义的结构中的不同文件夹中,例如year = 2019,year = 2020,并确保每年的数据全部且仅在该文件夹中.

What it means is that you have to have your data in different folders in a meaningful structure such as year=2019, year=2020, and make sure that the data for each year is all and only in that folder.

简单的解决方案是运行CREATE TABLE AS SELECT( CTAS)查询,该查询将复制数据并创建一个可以针对您的分析查询进行优化的新表.您可以选择表格式(例如,Parquet),压缩(例如,SNAPPY)以及分区模式(例如,每年).

The simple solution is to run a CREATE TABLE AS SELECT (CTAS) query that will copy the data and create a new table that can be optimized for your analytical queries. You can choose the table format (Parquet, for example), the compression (SNAPPY, for example), and also the partition schema (per year, for example).

这篇关于使用SQL代码向athena中的现有表添加新的分区方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-24 13:38