在Parquet数据上动态创建带有Avro架构的Hive外部表

在Parquet数据上动态创建带有Avro架构的Hive外部表

本文介绍了在Parquet数据上动态创建带有Avro架构的Hive外部表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图动态地(没有在Hive DDL中列出列名和类型)在地板数据文件上创建一个Hive外部表。我有底层镶木地板文件的Avro模式。



我的尝试是在DDL之下使用:

  CREATE EXTERNAL TABLE parquet_test 
ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
存储为PARQUET
LOCATION'hdfs:// myParquetFilesPath'
TBLPROPERTIES('avro.schema.url'='http://myHost/myAvroSchema.avsc');

使用正确的模式成功创建了我的Hive表,但是当我尝试读取数据时:

  SELECT * FROM parquet_test; 

我得到以下错误:

  java.io.IOException:org.apache.hadoop.hive.serde2.avro.AvroSerdeException:期待一个AvroGenericRecordWritable 

有没有办法成功创建和读取Parquet文件,而不提DDL中的列名称和类型列表?

解决方法

下面的查询工作:

$ $ p $ CREATE TABLE avro_test ROW格式SERDE'org.apache.hadoop.hive .serde2.avro.AvroSerDe'STORED AS AVRO TBLPROPERTIES('avro.schema.url'='myHost / myAvroSchema.avsc');

CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION'hdfs:// myParquetFilesPath';


I'm trying to dynamically (without listing column names and types in Hive DDL) create a Hive external table on parquet data files. I have the Avro schema of underlying parquet file.

My try is to use below DDL:

CREATE EXTERNAL TABLE parquet_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS PARQUET
LOCATION 'hdfs://myParquetFilesPath'
TBLPROPERTIES ('avro.schema.url'='http://myHost/myAvroSchema.avsc');

My Hive table is successfully created with the right schema, but when I try to read the data :

SELECT * FROM parquet_test;

I get the following error :

java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Expecting a AvroGenericRecordWritable

Is there a way to successfully create and read Parquet files, without mentioning columns name and types list in DDL?

解决方案

Below query works:

CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc');

CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';

这篇关于在Parquet数据上动态创建带有Avro架构的Hive外部表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 02:19