嗨,我有一个文档上传到名为Data
的Hive表中,其示例行如下所示:
He is a good boy and but his brother is a bad boy.
He is a naughty boy.
该表的架构为:
create table Data(
document_data STRING)
row format delimited
fields terminated by '\n'
stored as textfile;
我想编写一个查询,该查询仅统计单词
boy
和naughty`的出现并将其输出为: boy 3
naughty 1
最佳答案
在这里,我们将使用LATERAL
功能,该功能可以将单行转换为多行。
SELECT
word,
COUNT(*)
FROM Data
WHERE
word="boy" OR
word="naughty"
LATERAL VIEW
explode(split(document_data, ' ')) lateralTable AS word GROUP BY word;
我修改了在Word Count program in Hive中找到的版本。
关于hadoop - 使用配置单元搜索文档中特定单词的出现,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/33302316/