A =将'data'加载为(x,y);
B =将'data'加载为(x,z);
C = A乘以x,B乘以x;
D = foreach C生成flatten(A),flatten(b);
E = D组,A::x
以上陈述中确切完成的操作以及我们在实时场景中使用flatten的位置。
最佳答案
A = load 'input1' USING PigStorage(',') as (x, y);
(x,y) --> (1,2)(1,3)(2,3)
B = load 'input2' USING PigStorage(',') as (x, z);`
(x,z) --> (1,4)(1,2)(3,2)*/
C = cogroup A by x, B by x;`
result:
(1,{(1,2),(1,3)},{(1,4),(1,2)})
(2,{(2,3)},{})
(3,{},{(3,2)})
D = foreach C generate group, flatten(A), flatten(B);`
when both bags flattened, the cross product of tuples are returned.
result:
(1,1,2,1,4)
(1,1,2,1,2)
(1,1,3,1,4)
(1,1,3,1,2)
E = group D by A::x`
here your are grouping with x column of relation A.
(1,1,2,1,4)
(1,1,2,1,2)
(1,1,3,1,4)
(1,1,3,1,2)
关于hadoop - PIG Latin中FLATTEN运算符的目的是什么,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28472179/