本文介绍了hadoop猪袋减法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Pig 来解析我的应用程序日志,以了解上个月未被调用的用户(同一用户)调用了哪些公开的方法.

I'm using Pig to parse my application logs to know which exposed methods have been called by a user that wasn't called the last month (by the same user).

我设法在上个月之前和上个月之后获得按用户分组调用的方法:

I have managed to get methods called grouped by users before last month and after last month :

上个月之前的关系样本

u1      {(m1),(m2)}
u2      {(m3),(m4)}

上个月关系样本之后

u1      {(m1),(m3)}
u2      {(m1),(m4)}

我想要的是由用户找到哪些方法在 AFTER 中而不是在 BEFORE 中,即

What I want is to find, by users, which methods are in AFTER that are not in BEFORE, that is

NEWLY_CALLED 预期结果

NEWLY_CALLED expected result

u1      {(m3)}
u2      {(m1)}

问题:我怎样才能在 Pig 中做到这一点?可以减去行李吗?

Question : how can I do that in Pig ? is it possible to subtract bags ?

我尝试过 DIFF 函数,但它没有执行预期的减法.

I have tried DIFF function but it does not perform the expected subtraction.

问候,

乔尔

推荐答案

我觉得你需要写一个UDF,然后你可以使用

I think you need to write a UDF, then you can use

Set<T> setA ...
Set<T> setB ...
Set<T> setAminusB = setA.subtract(setB);

这篇关于hadoop猪袋减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-27 16:39