问题描述
我正在尝试实现一个 MapReduce 作业,其中每个映射器将占用 150 行文本文件,并且所有映射器将同时运行;此外,无论有多少地图任务失败,它都不应该失败.
I am trying to implement a MapReduce job, where each of the mappers would take 150 lines of the text file, and all the mappers would run simmultaniously; also, it should not fail, no matter how many map tasks fail.
下面是配置部分:
JobConf conf = new JobConf(Main.class);
conf.setJobName("My mapreduce");
conf.set("mapreduce.input.lineinputformat.linespermap", "150");
conf.set("mapred.max.map.failures.percent","100");
conf.setInputFormat(NLineInputFormat.class);
FileInputFormat.addInputPath(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
问题是 hadoop 为每一行文本创建一个映射器,它们似乎是按顺序运行的,如果单个失败,则作业失败.
The problem is that hadoop creates a mapper for every single line of text, they seem to run sequentially, and if a single one fails, the job fails.
由此推断,我应用的设置没有任何效果.
From this I deduce, that the settings I've applied do not have any effect.
我做错了什么?
推荐答案
如果您想快速找到 hadoop 新 api 选项的正确名称,请使用此链接:http://pydoop.sourceforge.net/docs/examples/intro.html#hadoop-0-21-0-笔记 .
If you want to quickly find the correct names for the options for hadoop's new api, use this link: http://pydoop.sourceforge.net/docs/examples/intro.html#hadoop-0-21-0-notes .
这篇关于Hadoop 选项没有任何效果(mapreduce.input.lineinputformat.linespermap、mapred.max.map.failures.percent)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!