本文介绍了“存储为InputFormat,OutputFormat”和“存储为”在Hive中的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

执行 show create table 然后执行结果 create table 语句(如果表为ORC)时发生。

使用 show create table ,你可以得到:

/ p>

 保存为输入格式
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'




lockquote

失败,出现异常
java.io.IOException:java.lang.ClassCastException:
org。 apache.hadoop.hive.ql.io.orc.OrcStruct不能转换为
org.apache.hadoop.io.BinaryComparable




为了解决这个问题,只需将 create table 语句更改为 STORED AS ORC


但是,正如答案在类似问题中所说:

我找不出原因。

解决方案

STORED AS 意味着3件事:


  1. SERDE

  2. INPUTFORMAT

  3. OUTPUTFORMAT

只有最后2个,让SERDE由 hive.default.serde

定义



演示



hive.default.serde

  set hive.default.serde; 






  hive .default.serde = org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

STORED AS ORC

  create table mytable(i int)
存储为orc;

显示create table mytable;






注意SERDE是'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

  CREATE TABLE`mytable`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
作为INPUTFORMAT存储
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/ home / cloudera / local_db / mytable'
TBLPROPERTIES(
'COLUMN_STATS_ACCURATE'='{\BASIC_STATS \:\true \}' ,
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982059')






存储为输入格式... OUTPUTFORMAT ...

  create table mytable2(i int)
保存为
INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;

显示create table mytable2
;






注意SERDE是'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

  CREATE TABLE` mytable2`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
存储为输入格式
' org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/ home / cloudera / local_db / mytable2'
TBLPROPERTIES(
'COLUMN_STATS_ACCURATE'='{\BASIC_STATS \:\true \}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
' transient_lastDdlTime'='1496982426')


Issue when executing a show create table and then executing the resulting create table statement if the table is ORC.

Using show create table, you get this:

STORED AS INPUTFORMAT
  ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
  ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’

But if you create the table with those clauses, you will then get the casting error when selecting. Error likes:


To fix this, just change create table statement to STORED AS ORC

But, as the answer said in the similar question:
What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive? .

I can't figure out the reason.

解决方案

STORED AS implies 3 things:

  1. SERDE
  2. INPUTFORMAT
  3. OUTPUTFORMAT

You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde

Demo

hive.default.serde

set hive.default.serde;


hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

STORED AS ORC

create table mytable (i int)
stored as orc;

show create table mytable;


Note that the SERDE is 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

CREATE TABLE `mytable`(
  `i` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file:/home/cloudera/local_db/mytable'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
  'numFiles'='0',
  'numRows'='0',
  'rawDataSize'='0',
  'totalSize'='0',
  'transient_lastDdlTime'='1496982059')


STORED AS INPUTFORMAT ... OUTPUTFORMAT ...

create table mytable2 (i int)
STORED AS
INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;

show create table mytable2
;


Note that the SERDE is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

CREATE TABLE `mytable2`(
  `i` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'file:/home/cloudera/local_db/mytable2'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
  'numFiles'='0',
  'numRows'='0',
  'rawDataSize'='0',
  'totalSize'='0',
  'transient_lastDdlTime'='1496982426')

这篇关于“存储为InputFormat,OutputFormat”和“存储为”在Hive中的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 03:14