假设我有一个文本文件count.txt,其中包含以下提到的段落

    I am working  in hadoop along with  various courses like Hadoop, Hana, Java etc
    I love working with hadoop
    This is hadoop project

现在我需要获取hadoop单词在上述文件中出现了多少次

以下代码是我尝试过的
    c1= load '/...../count.txt' using PigStorage(',') as (Name:chararray);
    c2 = foreach c1  generate FLATTEN(TOKENIZE(LOWER(Name)))as (Name1:chararray);
    dump c2;
    c3 = filter c2 by Name1=='hadoop';
    dump c3;

在这里我得到的输出
(hadoop)
(hadoop)
(hadoop)
(hadoop)

我需要的是数字4,而不是单词hadoop重复4次。因此我试图执行
`c4 = foreach c3 generate COUNT($0);`

并出错。.确实可以帮助我,可能是我找不到的简单事情。
提前致谢。

最佳答案

试试这个:

只是做一组c2:

c3 = filter c2 by Name1=='hadoop'
grouped = GROUP c3 BY Name1;
wordcount = FOREACH grouped GENERATE $0, COUNT($1);
DUMP wordcount

让我知道是否有帮助。

关于hadoop - PIG中的字数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52995025/

10-12 00:36
查看更多