我在Hive中有一个表tab
,看起来像这样:
word | occurrences
---- | -----------
by | 10
hi | 1
same | 3
love | 6
我想使用Hive查询来计算和显示单词的频率(出现次数除以整个列的总和)。例如,单词“by”的频率为10 /(10 + 1 + 3 + 6)= 0.5。
我尝试了这个:
SELECT word, occurrences, occurrences/SUM(occurrences) AS frequency
FROM tab
GROUP BY word, occurrences
ORDER BY frequency;
但这给出了:
word | occurrences | frequency
---- | ----------- | ---------
by | 10 | 1
hi | 1 | 1
same | 3 | 1
love | 6 | 1
我不确定自己在做什么错。我不是很好的SQL。提前致谢。
最佳答案
试试下面的sql,这里用SUM() OVER()
SELECT word, occurrences, occurrences/SUM(occurrences) OVER() AS frequency
FROM tab
ORDER BY frequency;
关于sql - SQL查询以计算Hive中的频率,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48200451/