问题描述
执行 show create table
然后执行结果 create table
语句(如果表为ORC)时发生。 使用
show create table
,你可以得到: / p>
保存为输入格式
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
lockquote
失败,出现异常
java.io.IOException:java.lang.ClassCastException:
org。 apache.hadoop.hive.ql.io.orc.OrcStruct不能转换为
org.apache.hadoop.io.BinaryComparable
为了解决这个问题,只需将 create table
语句更改为 STORED AS ORC
但是,正如答案在类似问题中所说:
。
我找不出原因。
STORED AS
意味着3件事:
- SERDE
- INPUTFORMAT
- OUTPUTFORMAT
只有最后2个,让SERDE由 hive.default.serde
定义
演示
hive.default.serde
set hive.default.serde;
hive .default.serde = org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
STORED AS ORC
create table mytable(i int)
存储为orc;
显示create table mytable;
注意SERDE是'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
CREATE TABLE`mytable`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
作为INPUTFORMAT存储
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/ home / cloudera / local_db / mytable'
TBLPROPERTIES(
'COLUMN_STATS_ACCURATE'='{\BASIC_STATS \:\true \}' ,
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982059')
存储为输入格式... OUTPUTFORMAT ...
create table mytable2(i int)
保存为
INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;
显示create table mytable2
;
注意SERDE是'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
CREATE TABLE` mytable2`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
存储为输入格式
' org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/ home / cloudera / local_db / mytable2'
TBLPROPERTIES(
'COLUMN_STATS_ACCURATE'='{\BASIC_STATS \:\true \}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
' transient_lastDdlTime'='1496982426')
Issue when executing a show create table
and then executing the resulting create table
statement if the table is ORC.
Using show create table
, you get this:
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’
But if you create the table with those clauses, you will then get the casting error when selecting. Error likes:
To fix this, just change create table
statement to STORED AS ORC
But, as the answer said in the similar question:What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive? .
I can't figure out the reason.
STORED AS
implies 3 things:
- SERDE
- INPUTFORMAT
- OUTPUTFORMAT
You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde
Demo
hive.default.serde
set hive.default.serde;
hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
STORED AS ORC
create table mytable (i int)
stored as orc;
show create table mytable;
Note that the SERDE is 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
CREATE TABLE `mytable`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982059')
STORED AS INPUTFORMAT ... OUTPUTFORMAT ...
create table mytable2 (i int)
STORED AS
INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;
show create table mytable2
;
Note that the SERDE is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
CREATE TABLE `mytable2`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable2'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982426')
这篇关于“存储为InputFormat,OutputFormat”和“存储为”在Hive中的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!