我在Hive中有一个表tab,看起来像这样:

word | occurrences
---- | -----------
by   | 10
hi   | 1
same | 3
love | 6

我想使用Hive查询来计算和显示单词的频率(出现次数除以整个列的总和)。例如,单词“by”的频率为10 /(10 + 1 + 3 + 6)= 0.5。

我尝试了这个:
SELECT word, occurrences, occurrences/SUM(occurrences) AS frequency
FROM tab
GROUP BY word, occurrences
ORDER BY frequency;

但这给出了:
word | occurrences | frequency
---- | ----------- | ---------
by   | 10          | 1
hi   | 1           | 1
same | 3           | 1
love | 6           | 1

我不确定自己在做什么错。我不是很好的SQL。提前致谢。

最佳答案

试试下面的sql,这里用SUM() OVER()

SELECT word, occurrences, occurrences/SUM(occurrences) OVER() AS frequency
FROM tab
ORDER BY frequency;

关于sql - SQL查询以计算Hive中的频率,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48200451/

10-12 23:03