问题描述
有没有办法在作业配置中从 Mapper
设置参数,并且可以从 Reducer
中访问。
我尝试了下面的代码:
在Mapper中: map(..)
: context.getConfiguration()。set(Sum,100);
Reducer: reduce ..)
: context.getConfiguration()。get(Sum);
null 。
是否有任何方法可以实现这个或任何错过的东西从我身边?
据我所知,这是不可能的。作业配置在运行时被jobtracker序列化为XML,并被复制到所有任务节点。对Configuration对象的任何更改只会影响该对象,该对象对特定任务JVM是本地的;它不会在每个节点上更改XML。
通常,您应该尽量避免任何全局状态。它违背了MapReduce范例,通常会阻止并行性。如果您绝对必须在Map和Reduce阶段之间传递信息,并且您无法通过通常的Shuffle / Sort步骤执行此操作,则可以尝试写入分布式缓存或直接写入HDFS。
Is there any way to set a parameter in job configuration from Mapper
and is accessible from Reducer
.
I tried the below code
In Mapper: map(..)
: context.getConfiguration().set("Sum","100");
In reducer: reduce(..)
: context.getConfiguration().get("Sum");
But in reducer value is returned as null
.
Is there any way to implement this or any thing missed out from my side?
As far as I know, this is not possible. The job configuration is serialized to XML at run-time by the jobtracker, and is copied out to all task nodes. Any changes to the Configuration object will only affect that object, which is local to the specific task JVM; it will not change the XML at every node.
In general, you should try to avoid any "global" state. It is against the MapReduce paradigm and will generally prevent parallelism. If you absolutely must pass information between the Map and Reduce phase, and you cannot do it via the usual Shuffle/Sort step, then you could try writing to the Distributed Cache, or directly to HDFS.
这篇关于在MapReduce作业配置中设置参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!