如何从实木复合地板文件中获取架构/列名称?

本文介绍了如何从实木复合地板文件中获取架构/列名称?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个存储在HDFS中的文件，其格式为part-m-00000.gz.parquet

I have a file stored in HDFS as part-m-00000.gz.parquet

我尝试运行hdfs dfs -text dir/part-m-00000.gz.parquet，但是它已压缩，所以我运行了gunzip part-m-00000.gz.parquet，但是由于无法识别.parquet扩展名，因此它没有解压缩文件.

I've tried to run hdfs dfs -text dir/part-m-00000.gz.parquet but it's compressed, so I ran gunzip part-m-00000.gz.parquet but it doesn't uncompress the file since it doesn't recognise the .parquet extension.

如何获取此文件的架构/列名称?

How do I get the schema / column names for this file?

推荐答案

您将无法使用hdfs dfs -text打开"文件，因为它不是文本文件.与文本文件相比，Parquet文件写入磁盘的方式非常不同.

You won't be able "open" the file using a hdfs dfs -text because its not a text file. Parquet files are written to disk very differently compared to text files.

同样，Parquet项目提供了镶木地板工具来执行您要执行的任务.打开并查看架构，数据，元数据等.

And for the same matter, the Parquet project provides parquet-tools to do tasks like which you are trying to do. Open and see the schema, data, metadata etc.

检出parquet-tool项目(简单地说就是jar文件).镶木工具

Check out the parquet-tool project (which is put simply, a jar file.)parquet-tools

支持Parquet并为Parquet做出巨大贡献的Cloudera，也有一个漂亮的页面，其中包含有关Parquet工具用法的示例.该页面上针对您的用例的一个示例是

Also Cloudera which support and contributes heavily to Parquet, also has a nice page with examples on usage of parquet-tools. A example from that page for your use case is

parquet-tools schema part-m-00000.parquet

签出Cloudera页面. 将Parquet文件格式与Impala，Hive结合使用，Pig，HBase和MapReduce

Checkout the Cloudera page. Using the Parquet File Format with Impala, Hive, Pig, HBase, and MapReduce

这篇关于如何从实木复合地板文件中获取架构/列名称?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！