我有一个包含以下内容的文件

输入:

TOYID;TOYSeries;ModuleID;ID;PART_NUMBER;SUPPLIER;LAND
394107;C204; 731305; 69807402;A0001532122;ABC;AT
394107;C204; 731307; 69807402;A0001532122;ABC;AT
394107;C204; 731315; 69807402;A0001532122;ABC;AT
394107;C204; 731325; 69807402;A0001532122;ABC;AT
394107;C204; 731335; 69807402;A0001532122;ABC;AT
394107;C204; 731345; 69807402;A0001532122;ABC;AT

我想要这样的输出
输出:
SUPPLIER;LAND; COUNT(SUPPLIER,LAND);  TOYID         TOYSeries;   ModuleID;   ID;          PART_NUMBER
ABC;AT;             6 ;               394107          C204;       731305; 69807402;      A0001532122
ABC;AT              6 ;               394107          C204;       731307; 69807402;      A0001532122

我试过了:
A = LOAD 'hdfs://localhost:8020/BigData_POC/....../TOY_Detail.txt' USING PigStorage(';') AS (TOYID:chararray,TOYSeries:chararray,ModuleID:chararray,ID:c‌​hararray,DESCRIPTION‌​:chararray,PART_NUMB‌​ER:chararray,SUPPLIE‌​R:chararray,LAND:cha‌​rarray);
B = FOREACH A GENERATE TOYID,ModuleID,DESCRIPTION,PART_NUMBER,SUPPLIER,LAND;
C = GROUP B by (SUPPLIER,LAND);
D = foreach C generate group, COUNT(B) as cnt, B.TOYID,B.ModuleID,B.PART_NUMBER;

我得到这样的输出:



您知道可用于此的任何 pig 拉丁文字吗?

最佳答案

根据您的评论,您可以尝试一下作为解决方案吗?我自己尚未验证,因此可能也需要一些调整。

D = foreach C generate group, COUNT(B) as cnt;
E = foreach D generate group.supplier as supplier, group.land as land, cnt;
F = Join B by (supplier,land),E by (supplier,land)

关于hadoop - PIG拉丁脚本-使用组和TOBAG,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40017231/

10-10 03:04