问题描述
有没有相应的方法可以做到这一点当我用Python编写程序时(使用流?)
我在apache的hadoop streaming文档中发现了以下内容:
但是我仍然无法理解如何使用这个在我的映射器中。
任何帮助都非常感谢。
谢谢
根据
Hadoop将作业配置参数设置为Streaming程序的环境变量。但是,它会用下划线替换非字母数字字符,以确保它们是有效的名称。以下Python表达式说明了如何从Python Streaming脚本中检索mapred.job.id属性的值:
$ b os.environ [mapred_job_id]
您还可以通过将-cmdenv选项应用于Streaming启动器程序(您希望设置的每个变量一次),为MapReduce启动的Streaming进程设置环境变量。例如,以下设置MAGIC_PARAMETER环境变量:
-cmdenv MAGIC_PARAMETER = abracadabra
I am able to find the name if the input file in a mapper class using FileSplit when writing the program in Java.
Is there a corresponding way to do this when I write a program in Python (using streaming?)
I found the following in the hadoop streaming document on apache:
But I still cant understand how to make use of this inside my mapper.
Any help is highly appreciated.
Thanks
According to the "Hadoop : The Definitive Guide"
Hadoop sets job configuration parameters as environment variables for Streaming programs. However, it replaces non-alphanumeric character with underscores to make sure they are valid names. The following Python expression illustrates how you can retrieve the value of the mapred.job.id property from within a Python Streaming script:
os.environ["mapred_job_id"]
You can also set environment variables for the Streaming process launched by MapReduce by applying the -cmdenv option to the Streaming launcher program (once for each variable you wish to set). For example, the following sets the MAGIC_PARAMETER environment variable:
-cmdenv MAGIC_PARAMETER=abracadabra
这篇关于在流式传输hadoop程序中获取输入文件名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!