问题描述
我正在使用 Pig 来解析我的应用程序日志,以了解上个月未被调用的用户(同一用户)调用了哪些公开的方法.
I'm using Pig to parse my application logs to know which exposed methods have been called by a user that wasn't called the last month (by the same user).
我设法在上个月之前和上个月之后获得按用户分组调用的方法:
I have managed to get methods called grouped by users before last month and after last month :
上个月之前的关系样本
u1 {(m1),(m2)}
u2 {(m3),(m4)}
上个月关系样本之后
u1 {(m1),(m3)}
u2 {(m1),(m4)}
我想要的是由用户找到哪些方法在 AFTER 中而不是在 BEFORE 中,即
What I want is to find, by users, which methods are in AFTER that are not in BEFORE, that is
NEWLY_CALLED 预期结果
NEWLY_CALLED expected result
u1 {(m3)}
u2 {(m1)}
问题:我怎样才能在 Pig 中做到这一点?可以减去行李吗?
Question : how can I do that in Pig ? is it possible to subtract bags ?
我尝试过 DIFF 函数,但它没有执行预期的减法.
I have tried DIFF function but it does not perform the expected subtraction.
问候,
乔尔
推荐答案
我觉得你需要写一个UDF,然后你可以使用
I think you need to write a UDF, then you can use
Set<T> setA ...
Set<T> setB ...
Set<T> setAminusB = setA.subtract(setB);
这篇关于hadoop猪袋减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!