问题描述
以下方法在
public void run(String inputPath,String outputPath)throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName(wordcount);
//键是单词(字符串)
conf.setOutputKeyClass(Text.class);
//这些值是计数(整数)
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);
FileInputFormat.addInputPath(conf,new Path(inputPath));
FileOutputFormat.setOutputPath(conf,new Path(outputPath));
JobClient.runJob(conf);
}
此方法在 Hadoop权威指南中给出2012
Oreilly书。
public static void main(String [] args)throws Exception { b $ b if(args.length!= 2){
System.err.println(Usage:MaxTemperature< input path>< output path>);
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName(Max temperature);
FileInputFormat.addInputPath(job,new Path(args [0]));
FileOutputFormat.setOutputPath(job,new Path(args [1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true)?0:1);
在尝试Oreilly书中给出的程序时,我发现 Job
类已被弃用。由于Oreilly的书基于Hadoop 2(纱线),我很惊讶地发现他们已经使用了已弃用的类。
我想知道每个人使用哪种方法?我使用前一种方法。如果我们重写run()方法,我们可以使用hadoop jar选项,比如-D,所有这些在几乎所有的hadoop项目中都是非常必要的。
不知道我们是否可以通过main()方法使用它们。
I have observed that there are multiple ways to write driver method of Hadoop program.
Following method is given in Hadoop Tutorial by Yahoo
public void run(String inputPath, String outputPath) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);
FileInputFormat.addInputPath(conf, new Path(inputPath));
FileOutputFormat.setOutputPath(conf, new Path(outputPath));
JobClient.runJob(conf);
}
and this method is given in Hadoop The Definitive Guide 2012
book by Oreilly.
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path> <output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
While trying program given in Oreilly book I found that constructors of Job
class are deprecated. As Oreilly book is based on Hadoop 2 (yarn) I was surprised to see that they have used deprecated class.
I would like to know which method everyone uses?
I use the former approach.If we go with overriding the run() method, we can use hadoop jar options like -D,-libjars,-files etc.,.All these are very much necessary in almost any hadoop project.Not sure if we can use them through the main() method.
这篇关于多种编写Hadoop程序驱动程序的方法 - 选择哪一种?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!