如何将参数传递给python Hadoop串流作业？

本文介绍了如何将参数传递给python Hadoop串流作业？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于python Hadoop流式作业，我如何将一个参数传递给reducer脚本，以便根据传入的参数使其行为不同？

我了解到流式作业的格式为：

hadoop jar hadoop-streaming.jar - input -output -mapper mapper.py -reducer reducer.py ...

我想影响reducer.py。

解决方案

命令行选项 -reducer 的参数可以是任何命令，因此您可以尝试：

  $ HADOOP_HOME / bin / hadoop jar $ HADOOP_HOME / hadoop-streaming.jar \ 
 -input inputDirs \ 
 -output outputDir \ 
 -mapper myMapper.py \ 
 -reducer'myReducer.py 1 2 3'\ 
 -file myMapper.py \ 
 -file myReducer .py

假设 myReducer.py 可执行文件。免责声明：我没有尝试过，但之前我已经将类似的复杂字符串传递给 -mapper 和 -reducer 。

也就是说，您是否试过了

  -cmdenv name = value

选项，只需让您的Python Reducer从环境中获得它的价值？这只是另一种做事的方式。

For a python Hadoop streaming job, how do I pass a parameter to, for example, the reducer script so that it behaves different based on the parameter being passed in?
I understand that streaming jobs are called in the format of:
hadoop jar hadoop-streaming.jar -input -output -mapper mapper.py -reducer reducer.py ...
I want to affect reducer.py.
解决方案
The argument to the command line option -reducer can be any command, so you can try:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -input inputDirs \ -output outputDir \ -mapper myMapper.py \ -reducer 'myReducer.py 1 2 3' \ -file myMapper.py \ -file myReducer.py
assuming myReducer.py is made executable. Disclaimer: I have not tried it, but I have passed similar complex strings to -mapper and -reducer before.
That said, have you tried the
-cmdenv name=value
option, and just have your Python reducer get its value from the environment? It's just another way to do things.

这篇关于如何将参数传递给python Hadoop串流作业？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！