问题描述
我试图动态地(没有在Hive DDL中列出列名和类型)在地板数据文件上创建一个Hive外部表。我有底层镶木地板文件的Avro模式。
我的尝试是在DDL之下使用:
CREATE EXTERNAL TABLE parquet_test
ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
存储为PARQUET
LOCATION'hdfs:// myParquetFilesPath'
TBLPROPERTIES('avro.schema.url'='http://myHost/myAvroSchema.avsc');
使用正确的模式成功创建了我的Hive表,但是当我尝试读取数据时:
SELECT * FROM parquet_test;
我得到以下错误:
java.io.IOException:org.apache.hadoop.hive.serde2.avro.AvroSerdeException:期待一个AvroGenericRecordWritable
有没有办法成功创建和读取Parquet文件,而不提DDL中的列名称和类型列表?
下面的查询工作:
$ $ p $ CREATE TABLE avro_test ROW格式SERDE'org.apache.hadoop.hive .serde2.avro.AvroSerDe'STORED AS AVRO TBLPROPERTIES('avro.schema.url'='myHost / myAvroSchema.avsc');
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION'hdfs:// myParquetFilesPath';
I'm trying to dynamically (without listing column names and types in Hive DDL) create a Hive external table on parquet data files. I have the Avro schema of underlying parquet file.
My try is to use below DDL:
CREATE EXTERNAL TABLE parquet_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS PARQUET
LOCATION 'hdfs://myParquetFilesPath'
TBLPROPERTIES ('avro.schema.url'='http://myHost/myAvroSchema.avsc');
My Hive table is successfully created with the right schema, but when I try to read the data :
SELECT * FROM parquet_test;
I get the following error :
java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Expecting a AvroGenericRecordWritable
Is there a way to successfully create and read Parquet files, without mentioning columns name and types list in DDL?
Below query works:
CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc');
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
这篇关于在Parquet数据上动态创建带有Avro架构的Hive外部表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!