我在运行Pig Streaming时遇到问题。当我使用一台机器启动交互式 pig 实例(仅供参考时,我正在通过SSH / Putty在交互式 pig AWS EMR实例的主节点上执行此操作)时,我的 pig 流可以完美工作(它也可以在Windows cloudera VM镜像上运行) )。但是,当我切换到使用多台计算机时,它只是停止工作并给出各种错误。
注意:
下面是到目前为止我尝试过的选项的小样本(以下所有命令都是在主节点/主节点的grunt shell中完成的,我将通过ssh / putty访问这些命令):
这就是我将python文件放入主节点的方式,以便可以使用它:
cp s3n://darin.emr-logs/stream1.py stream1.py
copyToLocal stream1.py /home/hadoop/stream1.py
chmod 755 stream1.py
这些是我的各种流式尝试:
cooc = stream ct_pag_ph through `stream1.py`
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127
cooc = stream ct_pag_ph through `python stream1.py`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python stream1.py ' failed with exit status: 2
DEFINE X `stream1.py`;
cooc = stream ct_bag_ph through X;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127
DEFINE X `stream1.py`;
cooc = stream ct_bag_ph through `python X`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python X ' failed with exit status: 2
DEFINE X `stream1.py` SHIP('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;
ERROR 2017: Internal error creating job configuration.
DEFINE X `stream1.py` SHIP('/stream1.p');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;
DEFINE X `stream1.py` SHIP('stream1.py') CACHE('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
ERROR 2017: Internal error creating job configuration.
define X 'python /home/hadoop/stream1.py' SHIP('/home/hadoop/stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
最佳答案
DEFINE X `stream1.py` SHIP('stream1.py');
根据您的先决条件并在您当前的本地目录中具有stream1.py,对我看来有效。
确保这一点的一种方法:
DEFINE X `python stream1.py` SHIP('/local/path/stream1.py');
SHIP的目标是将命令复制到所有任务的工作目录中。