我有一个包含以下内容的文件
输入:
TOYID;TOYSeries;ModuleID;ID;PART_NUMBER;SUPPLIER;LAND
394107;C204; 731305; 69807402;A0001532122;ABC;AT
394107;C204; 731307; 69807402;A0001532122;ABC;AT
394107;C204; 731315; 69807402;A0001532122;ABC;AT
394107;C204; 731325; 69807402;A0001532122;ABC;AT
394107;C204; 731335; 69807402;A0001532122;ABC;AT
394107;C204; 731345; 69807402;A0001532122;ABC;AT
我想要这样的输出
输出:
SUPPLIER;LAND; COUNT(SUPPLIER,LAND); TOYID TOYSeries; ModuleID; ID; PART_NUMBER
ABC;AT; 6 ; 394107 C204; 731305; 69807402; A0001532122
ABC;AT 6 ; 394107 C204; 731307; 69807402; A0001532122
我试过了:
A = LOAD 'hdfs://localhost:8020/BigData_POC/....../TOY_Detail.txt' USING PigStorage(';') AS (TOYID:chararray,TOYSeries:chararray,ModuleID:chararray,ID:chararray,DESCRIPTION:chararray,PART_NUMBER:chararray,SUPPLIER:chararray,LAND:chararray);
B = FOREACH A GENERATE TOYID,ModuleID,DESCRIPTION,PART_NUMBER,SUPPLIER,LAND;
C = GROUP B by (SUPPLIER,LAND);
D = foreach C generate group, COUNT(B) as cnt, B.TOYID,B.ModuleID,B.PART_NUMBER;
我得到这样的输出:
您知道可用于此的任何 pig 拉丁文字吗?
最佳答案
根据您的评论,您可以尝试一下作为解决方案吗?我自己尚未验证,因此可能也需要一些调整。
D = foreach C generate group, COUNT(B) as cnt;
E = foreach D generate group.supplier as supplier, group.land as land, cnt;
F = Join B by (supplier,land),E by (supplier,land)
关于hadoop - PIG拉丁脚本-使用组和TOBAG,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40017231/