本文介绍了在Pig中放下单列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过大约20个ID列表筛选表格。现在我的代码如下所示:

  A = LOAD'ids.txt'使用PigStorage(); 
B = LOAD'massive_table'使用PigStorage();
C = JOIN A BY $ 0,B BY $ 0;
D = FOREACH C生成$ 1,$ 2,$ 3,$ 4,...
使用PigStorage()存储到'foo'中

我不喜欢的是D行,我必须重新创建一个新表以摆脱通过明确声明我想要的每一个其他列(并且有时候这是很多列)来加入列。我想知道是否有相当的东西:

 过滤B BY $ 0 IN(A)



或$:

  DROP $ 0 FROM C 


解决方案

p>


  • 哪些例子说明如何使用..符号来表示所有剩余的字段:

      D = FOREACH C GENERATE $ 1 ..; 

    假设您有0.9.0+ PIG


    I'm filtering a table by a list of about 20 IDs. Right now my code looks like this:

    A = LOAD 'ids.txt' USING PigStorage();
    B = LOAD 'massive_table' USING PigStorage();
    C = JOIN A BY $0, B BY $0;
    D = FOREACH C GENERATE $1, $2, $3, $4, ...
    STORE D INTO 'foo' USING PigStorage();
    

    What I don't like is line D, where I have to regenerate a new table to get rid of the joining column by explicitly declaring every single other column I want present (and sometimes that is a lot of columns). I'm wondering if there's something equivalent to:

    FILTER B BY $0 IN (A)
    

    or:

    DROP $0 FROM C
    
    解决方案

    Maybe similiar-ish to this question:

    That references a JIRA ticket: https://issues.apache.org/jira/browse/PIG-1693 which examples how you can use the .. notation to denote all the remaining fields:

    D = FOREACH C GENERATE $1 .. ;
    

    This assumes you have 0.9.0+ PIG

    这篇关于在Pig中放下单列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-26 17:22