我要加入3个表,并且在foreach中需要检查WheatReadStagingData包是否为空。
下面是代码
ReadStagingData = Load 'Staging_data.csv' Using PigStorage(',') As (PL_Posn_id:int,Brok_org_dly:double,Brok_org_ptd:double);
ReadPriorData = Load 'ptd.csv' Using PigStorage(',') As (PL_Posn_id:int,Brok_org_ptd:double);
ReadPriorFunctional = Load 'Functional.csv' Using PigStorage(',') AS (PL_Posn_id:int,Brok_fun_ptd:double,Brok_fun_ltd:double);
JoinDS1 = JOIN ReadPriorData BY PL_Posn_id,ReadPriorFunctional BY PL_Posn_id;
JoinDS2 = JOIN ReadStagingData by PL_Posn_id Left OUTER,JoinDS1 BY ReadPriorData::PL_Posn_id;
X = Foreach JoinDS2 {
**test = (NOT(IsEmpty(ReadStagingData))); //Error on this line**
GENERATE test,ReadStagingData::PL_Posn_id,
ReadStagingData::Brok_org_dly,
(ReadStagingData::Brok_org_ptd is not null ? ReadStagingData::Brok_org_ptd:ReadPriorData::Brok_org_ptd+ReadStagingData::Brok_org_dly);
};
Dump X;
当我运行上面的代码时,我收到错误消息无效的ReadStagingData。请帮助我
最佳答案
在您的关系X
中,ReadStagingData
不是一个包。标记ReadStagingData::Brok_org_dly
不表示从包中投影。它是一个顶级字段,它以JOIN
之后的方式命名,以确保每个字段都唯一命名。所以ReadStagingData
只是一个前缀。
另外,我不确定为什么要尝试进行检查-因为您正在执行LEFT OUTER
连接,因此X
中将没有任何记录,而ReadStagingData
中没有相应的记录。如果您正在执行RIGHT OUTER
连接,那将有所不同。
如果您打算进行RIGHT OUTER
连接,并且想要检查ReadStagingData
中的字段是否为NULL
,则可以这样做:
rsdIsNull = ReadStagingData::PL_Posn_id IS NULL;
关于hadoop - 检查袋子是否为空或是否在 pig 内foreach,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/19359622/