问题描述
我们想将Hive查询的结果放到CSV文件中。我认为命令应该看起来像这样:
we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:
insert overwrite directory '/home/output.csv' select books from table;
当我运行它,它说它completeld成功,但我永远找不到文件。我如何找到这个文件或者应该以不同的方式提取数据?
When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?
谢谢!
推荐答案
虽然可以使用 INSERT OVERWRITE
从Hive中获取数据,但它可能不是最适合您的情况的方法。首先让我解释一下 INSERT OVERWRITE
是什么,然后我将描述我用来从Hive表中获取tsv文件的方法。
Although it is possible to use INSERT OVERWRITE
to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE
does, then I'll describe the method I use to get tsv files from Hive tables.
根据,您的查询将数据存储在HDFS的目录中。格式不会是csv。
According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.
稍作修改code> LOCAL 关键字)将数据存储在本地目录中。
A slight modification (adding the LOCAL
keyword) will store the data in a local directory.
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;
当我运行类似的查询时,这里是输出的样子。
When I run a similar query, here's what the output looks like.
[lvermeer@hadoop temp]$ ll
total 4
-rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0
[lvermeer@hadoop temp]$ head 000000_0
"row1""col1"1234"col3"1234FALSE
"row2""col1"5678"col3"5678TRUE
就我个人而言,我通常直接通过Hive在命令行上运行我的查询来管理这种东西,像这样:
Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:
hive -e 'select books from table' > /home/lvermeer/temp.tsv
这给我一个可以使用的制表符分隔的文件。希望对你也有用。
That gives me a tab-separated file that I can use. Hope that is useful for you as well.
基于,我怀疑更好的解决方案是可用的,当使用Hive 0.11,但我无法自己测试。新语法应允许以下内容。
Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;
希望有帮助。
这篇关于如何将HiveQL查询的结果输出到CSV?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!