本文介绍了如何在 hive 中创建一个空的结构数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Hive 1.1.0中有一个视图,根据条件,它应该返回一个空数组或struct的数组

I have a view in Hive 1.1.0, based on a condition, it should return an empty array or an array of struct<name: string, jobslots: int>

这是我的代码:

select
      case when <condition>
             then array()
           else array(struct(t1.name, t1.jobslots))
       end
from table t1;

这里的问题是,空数组 array() 的类型是 array.因此,当我尝试将其插入表中时,它会引发错误.

The problem here is, that the empty array array() is of type array<string>. So when I try to insert it into a table, it throws an error.

如何更改它以返回 array<struct<name: string, jobslots:int>> 类型的空数组,以便 Hive 的 size() 函数在这个数组上返回 0?

How can I change this to return an empty array of type array<struct<name: string, jobslots:int>> so that Hive's size() function returns 0 on this array?

推荐答案

您可以使用 collect_listcolect_set 来收集从连接中获取的结构数组,并且连接条件为假,则 collect_list 将产生一个空的结构数组.

You can do it using collect_list or colect_set for collecting an array of structs taken from a join, and a join condition is false, then collect_list will produce an empty array of struct.

此查询返回大小为 0 的数组:

This query returns array of 0 size:

select a.id, size(collect_list(b.str))=0 array_size_zero
from
(select 2 id ) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on a.id=b.id
group by a.id

结果:

a.id    array_size_zero
2       true

如果在第一个子查询 a 中更改 id 以与 b 连接,它将返回具有 1 个元素的数组.而且这些结果是同类型的,你可以使用union all轻松查看.

If you change id in the first subquery a to join with b, it will return array with 1 element. And these results are of the same type, you can check it easily using union all.

检查结果类型相同:

select a.id, collect_list(b.str) my_array
from
(select 1 id ) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on a.id=b.id
group by a.id

union all

select a.id, collect_list(b.str) my_array
from
(select 2 id ) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on a.id=b.id
group by a.id

结果:

id  my_array
1   [{"name":null,"value":null}]
2   []

如果我尝试联合所有不同类型的空结构数组,例如 array() 会发生什么:

What happens if I try to UNION ALL empty array of struct with different type, for example array():

select 1 id, array() my_array

union all

select a.id, collect_list(b.str) my_array
from
(select 2 id ) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on a.id=b.id
group by a.id

异常:

编译语句时出错:FAILED: SemanticException Schema of联合的双方应该匹配:列 my_array 是类型第一个表上的数组和类型数组在第二张桌子上.不能告诉空 AST 的位置.

这表明第一个查询确实返回空的结构数组.您可以轻松地在查询中执行类似的联接.

This demonstrates that first query really returns empty array of struct.You can easily do similar join in your query.

如何在带条件的查询中使用它?演示:

How can you use it in your query with condition?Demo:

select a.id, case when true --Put your condition here instead of dummy <true>
                     then collect_list(a.str) --not empty
                  else collect_list(b.str) --this one is empty array of the same type
              end as my_array
from
(select 2 id, named_struct('name',null,'value',null) str) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on FALSE
group by a.id

CASE 表达式很愉快,不会引发不兼容类型的异常

CASE expression is quite happy and does not raise exception about incompatible types

这篇关于如何在 hive 中创建一个空的结构数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 16:03
查看更多