问题描述
例如,我有一个表user1,它包含列fname,lname和分区列是天。
我使用下面的脚本创建了表
CREATE TABLE user1(fname string,在将数据插入分区表后,它将如下所示。
pre>
fname lname天
.....................
AA AAA 20170201 .... >分区20170201
BB BBB 20170201
...................
CC CCC 20170202 ......>分区20170202
DD DDD 20170202
....................
EE EEE 20170203 .......>分区20170203
FF FFF 20170203
.......................
GG GGG 20170204 ........> ;分区20170204
HH HHH 20170204
.......................
当我使用partition列(即day = 20170201)帮助执行select查询时。
select * from user1 where day = 20170201;
结果如下
AA AAA 20170201
BB BBB 20170201
基于上表我想合并所有的小文件,即日= 20170201和日= 20170202和日= 20170203分区日= 20170203在我的分区表(即USer1).ie它应该如下所示。
fname lname日
............. ........
AA AAA 20170201
BB BBB 20170201
CC CCC 20170202
DD DDD 20170202
E EEE 20170203 ....... >分区20170203
FF FFF 20170203
.......................
GG GGG 20170204 ..... ...>分区20170204
HH HHH 20170204
.......................
可以请您就此提出建议,我该如何做到这一点?
提前。
- 创建由新字段
partition_day $ c
- 加载数据导入新表(在
情况
情况下为新分区定义条件)
$ bHow to merge existing Partition small files into one large file in one of the Partition .
For example I have a table user1, it contain columns fname,lname and partition column is day.
I have created table by using below script
CREATE TABLE user1(fname string,lname string) parittioned By (day int);
After inserting data into partion table it will look like below.
fname lname day ..................... AA AAA 20170201 ....>partition 20170201 BB BBB 20170201 ................... CC CCC 20170202 ......>partition 20170202 DD DDD 20170202 .................... EE EEE 20170203 .......>partition 20170203 FF FFF 20170203 ....................... GG GGG 20170204 ........>partition 20170204 HH HHH 20170204 .......................
When I execute select query with the help of partition column i.e. day=20170201.
select * from user1 where day=20170201;
It will give result like below
AA AAA 20170201 BB BBB 20170201
based on above table i want to merge the all small files i.e day =20170201 and day =20170202 and day=20170203 into partition day=20170203 in my partition table (i.e USer1).i.e. It should look like below.
fname lname day ..................... AA AAA 20170201 BB BBB 20170201 CC CCC 20170202 DD DDD 20170202 E EEE 20170203 .......>partition 20170203 FF FFF 20170203 ....................... GG GGG 20170204 ........>partition 20170204 HH HHH 20170204 .......................
can you please suggest on this,How can I achieve this?
Thanks in Advance.
解决方案- Create new table partitioned by new field
partition_day
:
- Load data into new table (define your conditions for new partitionsin the
case
)
这篇关于如何合并蜂巢中现有分区的小文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
- Create new table partitioned by new field
$ b