map-reduce有点新,所以如果有人可以用以下问题指导我,那将是很棒的
水果r-00000,蔬菜r-00000,Part-r-00000
对要运行多少个 reducer 感到困惑?我知道默认情况下,reducer的数量设置为1,并且由于文件名的数量部分相同,所以我相信只有一个reducer运行。我的理解正确吗?
为什么还要创建part-r-00000文件?我将所有输出写入“水果”文件或“蔬菜”文件中。
最佳答案
one reducer will run ,it has nothing to do with part of file name , no of reducer would be either specified by the user by default it calculated the size of the input file and amount of work which need to be done in reducers .
part-r-00000 : This is related with partitioning, Since we have one reducer so all partitions will point to this file
Number of reduces in most cases specified by users. It mostly depends on amount of work, which need to be done in reducers. But their number should not be very big, because of algorithm, used by Mapper to distribute data among reducers. Some frameworks, like Hive can calculate number of reducers using empirical 1GB output per reducer.
关于hadoop - MultitpleOutputFormat-Hadoop,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26232216/