我在运行Pig Streaming时遇到问题。当我使用一台机器启动交互式 pig 实例(仅供参考时,我正在通过SSH / Putty在交互式 pig AWS EMR实例的主节点上执行此操作)时,我的 pig 流可以完美工作(它也可以在Windows cloudera VM镜像上运行) )。但是,当我切换到使用多台计算机时,它只是停止工作并给出各种错误。

注意:

  • 我可以在多计算机实例上运行没有任何流命令的Pig脚本。
  • 我所有的 pig 工作都在 pig MapReduce模式下完成,而不是在-x local模式下完成。
  • 我的python脚本(stream1.py)在顶部#!/ usr / bin / env python

  • 下面是到目前为止我尝试过的选项的小样本(以下所有命令都是在主节点/主节点的grunt shell中完成的,我将通过ssh / putty访问这些命令):

    这就是我将python文件放入主节点的方式,以便可以使用它:
    cp s3n://darin.emr-logs/stream1.py stream1.py
    copyToLocal stream1.py /home/hadoop/stream1.py
    chmod 755 stream1.py
    

    这些是我的各种流式尝试:
    cooc = stream ct_pag_ph through `stream1.py`
    dump coco;
    ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127
    
    cooc = stream ct_pag_ph through `python stream1.py`;
    dump coco;
    ERROR 2090: Received Error while processing the reduce plan: 'python stream1.py ' failed with exit status: 2
    
    DEFINE X `stream1.py`;
    cooc = stream ct_bag_ph through X;
    dump coco;
    ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127
    
    DEFINE X `stream1.py`;
    cooc = stream ct_bag_ph through `python X`;
    dump coco;
    ERROR 2090: Received Error while processing the reduce plan: 'python X ' failed with exit status: 2
    
    DEFINE X `stream1.py` SHIP('stream1.py');
    cooc = STREAM ct_bag_ph THROUGH X;
    dump cooc;
    ERROR 2017: Internal error creating job configuration.
    
    DEFINE X `stream1.py` SHIP('/stream1.p');
    cooc = STREAM ct_bag_ph THROUGH X;
    dump cooc;
    
    DEFINE X `stream1.py` SHIP('stream1.py') CACHE('stream1.py');
    cooc = STREAM ct_bag_ph THROUGH X;
    ERROR 2017: Internal error creating job configuration.
    
    define X 'python /home/hadoop/stream1.py' SHIP('/home/hadoop/stream1.py');
    cooc = STREAM ct_bag_ph THROUGH X;
    

    最佳答案

    DEFINE X `stream1.py` SHIP('stream1.py');
    

    根据您的先决条件并在您当前的本地目录中具有stream1.py,对我看来有效。

    确保这一点的一种方法:
    DEFINE X `python stream1.py` SHIP('/local/path/stream1.py');
    

    SHIP的目标是将命令复制到所有任务的工作目录中。

    09-11 02:36
    查看更多