问题描述
我正在尝试构建一个只包含以下功能或这些功能组合的配置单元查询。例如,这些功能包括:
name =summary
name =details
$ b
name1 =车辆统计数据
name1 =accelerometer
类似地,客户Lan不应该被计数,因为他在name1中额外完成了超速操作,这并不符合上述条件。
客户姓名姓名1
快速汇总车辆统计数据
快速细节加速度计
快速支出加速
Lan摘要车辆统计
Lan细节加速度计
Lan细节加速
Hana细节加速度计
Hana摘要车辆统计
下表的计数必须为1,因为只有1名客户(Hana)在名称和车辆状态中仅完成摘要和详细信息 和
accelerometerin name1。
这是我目前的查询:
<$从表1中选择名称,名称1,计数(distinct(customername))
其中date_time介于2017-01-01 00:00:00和2017 -01-10 00:00:00
按名称分组,名称1
在('summary','detai ls')
或name1('vehicle stats','accelerometer')
任何建议会很棒!!
您也可以使用 collect_set
来
从表格1中选择客户名称
where date_time between 2017-01-01 00:00:00和2017-01-10 00:00:00
group by customername
concat_ws(',',collect_set(name))='summary ,细节'
和concat_ws(',',collect_set(name1))='车辆状态,加速计'
您必须对 collect_set
的连接输出进行排序以进行比较。
I am trying to build a hive query that does only the below features or a combination of these features. For example, the features include
name = "summary"
name = "details"
name1 = "vehicle stats"
name1 = "accelerometer"
I have to count the number of customers who strictly follow the above conditions. For example, in the below table, customer "Joy" should not be counted because he has additionally done "expenses" in name even though he has both "summary" and "details" in name and "vehicle stats" and "accelerometer" in name1.
Similarly, customer "Lan" should not been counted as he has additionally done "speeding" in name1 which is not in the above conditions.
customername name name1
Joy summary vehicle stats
Joy details accelerometer
Joy expenses speeding
Lan summary vehicle stats
Lan details accelerometer
Lan details speeding
Hana details accelerometer
Hana summary vehicle stats
Count for the below table has to be 1 as there is only 1 customer (Hana) who has done only "summary" and "details" in name and "vehicle stats" and "accelerometer" in name1.
This is the query that I currently have:
select name, name1, count(distinct(customername))
from table1
where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
group by name, name1
having name in ('summary', 'details')
or name1 in ('vehicle stats', 'accelerometer')
Any suggestions would be great!!
You can also use collect_set
to check only for the specified entries in those columns.
select customername
from table1
where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
group by customername
having concat_ws(',',collect_set(name)) = 'summary,details'
and concat_ws(',',collect_set(name1)) = 'vehicle stats,accelerometer'
You have to sort the concatenated output from collect_set
for comparison.
这篇关于具有特定排除条件的Hive查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!