我要加入3个表,并且在foreach中需要检查WheatReadStagingData包是否为空。
下面是代码

ReadStagingData = Load 'Staging_data.csv' Using PigStorage(',') As     (PL_Posn_id:int,Brok_org_dly:double,Brok_org_ptd:double);

ReadPriorData = Load 'ptd.csv' Using PigStorage(',') As (PL_Posn_id:int,Brok_org_ptd:double);

ReadPriorFunctional = Load 'Functional.csv' Using PigStorage(',') AS (PL_Posn_id:int,Brok_fun_ptd:double,Brok_fun_ltd:double);

JoinDS1 = JOIN ReadPriorData BY PL_Posn_id,ReadPriorFunctional BY PL_Posn_id;

JoinDS2 = JOIN ReadStagingData by PL_Posn_id Left OUTER,JoinDS1 BY      ReadPriorData::PL_Posn_id;

X = Foreach JoinDS2 {
    **test = (NOT(IsEmpty(ReadStagingData))); //Error on this line**
    GENERATE test,ReadStagingData::PL_Posn_id,
    ReadStagingData::Brok_org_dly,
   (ReadStagingData::Brok_org_ptd is not null ? ReadStagingData::Brok_org_ptd:ReadPriorData::Brok_org_ptd+ReadStagingData::Brok_org_dly);
};

Dump X;

当我运行上面的代码时,我收到错误消息无效的ReadStagingData。请帮助我

最佳答案

在您的关系X中,ReadStagingData不是一个包。标记ReadStagingData::Brok_org_dly不表示从包中投影。它是一个顶级字段,它以JOIN之后的方式命名,以确保每个字段都唯一命名。所以ReadStagingData只是一个前缀。

另外,我不确定为什么要尝试进行检查-因为您正在执行LEFT OUTER连接,因此X中将没有任何记录,而ReadStagingData中没有相应的记录。如果您正在执行RIGHT OUTER连接,那将有所不同。

如果您打算进行RIGHT OUTER连接,并且想要检查ReadStagingData中的字段是否为NULL,则可以这样做:

rsdIsNull = ReadStagingData::PL_Posn_id IS NULL;

关于hadoop - 检查袋子是否为空或是否在 pig 内foreach,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/19359622/

10-11 08:31