以下代码不完全返回我要计算的内容;唯一身份用户数。任何的想法?

data = LOAD 'input_initial' AS (user_id,item_id,rating,timestamp);
data = FOREACH data GENERATE user_id,item_id;
STORE data INTO 'input_final';
data_users = FOREACH data GENERATE user_id;
group_users = GROUP data_users BY user_id;
count_users = FOREACH group_users GENERATE COUNT(data_users);
STORE count_users INTO 'count_users';

最佳答案

您需要修改最后的GROUP操作以对“全部”而不是单个字段进行操作:

group_users = GROUP data_users BY user_id;
grp_all = GROUP group_users ALL;
count_users = FOREACH grp_all GENERATE COUNT(group_users);

关于hadoop - 如何计算PIG的唯一身份用户数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14728039/

10-12 22:51