问题描述
有几个地方表示Hadoop作业中默认的reducer数是1.您可以使用mapred.reduce.tasks手动设置reducer数。
Several places say the default # of reducers in a Hadoop job is 1. You can use the mapred.reduce.tasks symbol to manually set the number of reducers.
当我运行Hive作业(在Amazon EMR上,AMI 2.3.3)时,它的一些减速器数量大于1。看看作业设置,有些东西已经设置了mapred.reduce.tasks,我认为Hive。它是如何选择这个数字的?
When I run a Hive job (on Amazon EMR, AMI 2.3.3), it has some number of reducers greater than one. Looking at job settings, something has set mapred.reduce.tasks, I presume Hive. How does it choose that number?
注意:下面是运行Hive作业时的一些消息,应该是一个线索:
Note: here are some messages while running a Hive job that should be a clue:
...
Number of reduce tasks not specified. Estimated from input data size: 500
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
...
推荐答案
默认值为1也许是为了安装vanilla Hadoop。 Hive重写它。
The default of 1 maybe for a vanilla Hadoop install. Hive overrides it.
在开放源码配置单元(和EMR可能)中
In open source hive (and EMR likely)
# reducers = (# bytes of input to mappers)
/ (hive.exec.reducers.bytes.per.reducer)
此帖子 a>表示默认的hive.exec.reducers.bytes.per.reducer是1G。
This post says default hive.exec.reducers.bytes.per.reducer is 1G.
您可以使用 hive.exec.reducers.max 。
如果你确切知道你想要的reducer的数量,你可以设置 mapred.reduce.tasks
,这个将覆盖所有启发式。 (默认情况下,它设置为-1,表示Hive应该使用它的启发式方法。)
这篇关于Hive如何选择一份工作的减员人数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!