在 pig 拉丁中完全外部联接的结果中需要丢弃空值的帮助。以下是两个数据集:

A:

(BOS,2)
(BUR,81)
(LAS,8)

B:
(BUR,56)
(EWR,2)
(LAS,88)

完全外部加入后:
C :
(BOS,2,,)
(BUR,81,BUR,56)
(,,EWR,2)
(LAS,8,LAS,88)

我需要以以下格式获取输出:
(BOS,2)
(BUR,137)
(EWR,2)
(LAS,96)

尝试了分组,拼合,八字组的不同组合... ...但是无法找到解决方案。非常感谢您的帮助。
airline = load '/demo/data/airline/airline.csv' using PigStorage(',') as (Origin: chararray, Dest: chararray);
traffic_in = GROUP airline by Origin;
traffic_in_count= FOREACH traffic_in generate group as Origin , COUNT(airline) as count ;
traffic_out = GROUP airline by Dest;
traffic_out_count = FOREACH traffic_out generate group as Dest ,COUNT (airline) as count;
traffic_top = JOIN traffic_in_count by Origin FULL OUTER , traffic_out_count by Dest ;

最佳答案

编辑
不要使用OUTER JOIN,而要使用UNION,然后使用SUM第二列值。

A = LOAD 'test1.txt' using PigStorage(',') as (A1:chararray, A2:int);
B = LOAD 'test2.txt' using PigStorage(',') as (B1:chararray, B2:int);
C = UNION A,B;
D = GROUP C BY $0;
E = FOREACH D GENERATE group,SUM(C.$1);
DUMP E;

输出

hadoop - 在PIG中进行完全外部联接后丢弃空值-LMLPHP

10-04 19:11