我怎样才能创建一个外部表设置来设置文件中只有几列?

例如:在档案中,我有六列,A,B,C,D,E,F。但是在我的桌子上我只想要A,C,F。

可能吗?

最佳答案

我不知道有选择地从HDFS文件中为外部表包括列的方法。根据您的用例,基于外部表定义一个视图以仅包含所需的列可能就足够了。例如,给出以下外部表的愚蠢示例:

hive> CREATE EXTERNAL TABLE ext_table (
    >   A STRING,
    >   B STRING,
    >   C STRING,
    >   D STRING,
    >   E STRING,
    >   F STRING
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    > STORED AS TEXTFILE
    > LOCATION '/tmp/ext_table';
OK
Time taken: 0.401 seconds
hive> SELECT * FROM ext_table;
OK
row_1_col_A row_1_col_B     row_1_col_C     row_1_col_D     row_1_col_E     row_1_col_F
row_2_col_A row_2_col_B     row_2_col_C     row_2_col_D     row_2_col_E     row_2_col_F
row_3_col_A row_3_col_B     row_3_col_C     row_3_col_D     row_3_col_E     row_3_col_F
Time taken: 0.222 seconds, Fetched: 3 row(s)


然后创建一个视图以仅包含所需的列:

hive> CREATE VIEW filtered_ext_table AS SELECT A, C, F FROM ext_table;
OK
Time taken: 0.749 seconds
hive> DESCRIBE filtered_ext_table;
OK
a                           string
c                           string
f                           string
Time taken: 0.266 seconds, Fetched: 3 row(s)
hive> SELECT * FROM filtered_ext_table;
OK
row_1_col_A row_1_col_C     row_1_col_F
row_2_col_A row_2_col_C     row_2_col_F
row_3_col_A row_3_col_C     row_3_col_F
Time taken: 0.301 seconds, Fetched: 3 row(s)


实现所需目标的另一种方法是要求您能够修改支持外部表的HDFS文件-如果您感兴趣的列都位于每行的开头附近,那么您可以定义外部表以仅捕获前3列(不考虑文件中实际有多少列)。例如,使用与上面相同的数据文件:

hive> DROP TABLE IF EXISTS ext_table;
OK
Time taken: 1.438 seconds
hive> CREATE EXTERNAL TABLE ext_table (
    >   A STRING,
    >   B STRING,
    >   C STRING
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    > STORED AS TEXTFILE
    > LOCATION '/tmp/ext_table';
OK
Time taken: 0.734 seconds
hive> SELECT * FROM ext_table;
OK
row_1_col_A row_1_col_B     row_1_col_C
row_2_col_A row_2_col_B     row_2_col_C
row_3_col_A row_3_col_B     row_3_col_C
Time taken: 0.727 seconds, Fetched: 3 row(s)

关于hive - 外部表(HIVE)从文件中仅选择几列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28235479/

10-13 04:32