民间,
我们有一个要求,在我们将HIVE表与自己合并后要按组应用子句。
例如数据
CUSTOMER_NAME,PRODUCT_NAME,PURCHASE_PRICE
customer1,product1,20
customer1,product2,30
customer1,product1,25
现在,我们要通过考虑所有产品的总和以及CUSTOMER_NAME,PRODUCT_NAME的更高分组结果集,来获取客户(仅计算价格总和,子查询中不存在产品名称之后的前5位客户)
select customer_name,product_name,sum(purchase_price)
from customer_prd cprd
Join (select customer_name,sum(purchase_prices) order by sum group by customer_name limit 5) cprdd
where cprd.customer_name = cprdd.customer_name group by customer_name,product_name
收到错误消息说不能在HIVE中这样分组?
最佳答案
连接后,您的列名称变得不明确。 Hive不知道您是在乎连接左侧还是右侧。在这种情况下,这并不重要,因为您要在它们相等的情况下进行内部联接,但是hive不够聪明,无法弄清楚。试试这个:
select cprd.customer_name, cprd.product_name, sum(purchase_price)
from customer_prd cprd
Join (select customer_name, sum(purchase_price) as sum from customer_prd group by customer_name order by sum desc limit 5) cprdd
where cprd.customer_name = cprdd.customer_name group by cprd.customer_name, cprd.product_name;
关于hadoop - hive 集团加入后,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/23613399/