问题描述
我已使用orc格式将json数据帧保存在Hive中
I have saved json dataframe in Hive using orc format
jsonDF.write.format("orc").saveAsTable(hiveExamples.jsonTest)
现在,我需要在HDFS上将文件显示为普通文本.有没有办法做到这一点?
Now I need to display the file as a normal text on HDFS. Is there away to do this?
我用过hdfs dfs -text /path-of-table
,但是它以ORC格式显示数据.
I have used hdfs dfs -text /path-of-table
, but it displays the data in ORC format.
推荐答案
在linux shell
命令中,有一个名为hive --orcfiledump
From the linux shell
command there is an utility called hive --orcfiledump
要在HDFS
中查看ORC
文件的元数据,可以调用以下命令:
To see the metadata of an ORC
file in HDFS
you can invoke the command like:
[@localhost ~ ]$ hive --orcfiledump <path to HDFS ORC file>;
要以纯文本格式查看ORC
文件的内容,可以使用-d
选项调用该命令:
To see the content of your ORC
file in plain text you can invoke the command with -d
option:
[@localhost ~ ]$ hive --orcfiledump -d <path to HDFS ORC file>;
例如:
hive> DESCRIBE FORMATTED orders_orc;
Location: hdfs://localhost:8020/user/hive/warehouse/training_retail.db/orders_orc
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
hive> exit;
[@localhost ~ ]$ hdfs dfs -ls /user/hive/warehouse/training_retail.db/orders_orc
Found 1 items
-rwxrwxrwx 1 training hive 163094 2020-04-20 09:39 /user/hive/warehouse/training_retail.db/orders_orc/000000_0
[@localhost ~ ]$ hdfs dfs -tail /user/hive/warehouse/training_retail.db/orders_orc/000000_0
��+"%ў�.�b.����8V$tߗ��\|�?�xM��
*�ڣ�������!�2���_���Ͳ�V���
r�E(����~�uM�/&��&x=-�&2�T��o��JD���Q��m5��#���8Iqe����A�^�ێ"���@�t�w�m�A ���3|�����NL�Q����p�d�#:}S-D�Wq�_"����
[@localhost ~ ]$ hive --orcfiledump /user/hive/warehouse/training_retail.db/orders_orc/000000_0;
Structure for /user/hive/warehouse/training_retail.db/orders_orc/000000_0
File Version: 0.12 with HIVE_8732
20/04/20 10:19:58 INFO orc.ReaderImpl: Reading ORC rows from /user/hive/warehouse/training_retail.db/orders_orc/000000_0 with {include: null, offset: 0, length: 9223372036854775807}
Rows: 68883
Compression: ZLIB
Compression size: 262144
Type: struct<_col0:int,_col1:string,_col2:int,_col3:string>
....
File length: 163094 bytes
Padding length: 0 bytes
Padding ratio: 0%
[@localhost ~ ]$ hive --orcfiledump -d /user/hive/warehouse/training_retail.db/orders_orc/000000_0 | head -n 5
{"_col0":1,"_col1":"2013-07-25 00:00:00.0","_col2":11599,"_col3":"CLOSED"}
{"_col0":2,"_col1":"2013-07-25 00:00:00.0","_col2":256,"_col3":"PENDING_PAYMENT"}
{"_col0":3,"_col1":"2013-07-25 00:00:00.0","_col2":12111,"_col3":"COMPLETE"}
{"_col0":4,"_col1":"2013-07-25 00:00:00.0","_col2":8827,"_col3":"CLOSED"}
{"_col0":5,"_col1":"2013-07-25 00:00:00.0","_col2":11318,"_col3":"COMPLETE"}
您可以点击此链接以获取详细信息:
You can follow this link for details:
这篇关于在HDFS上,我想显示以ORC格式存储的配置单元表的普通文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!