Hadoop运行hadoop-examples-1.2.1.jar(wordcount)

1  前言

Hadoop 安装参考下列面链接博文

http://blog.chinaunix.net/uid-31429544-id-5759400.html

2  运行hadoop提供的例子

2.1  启动hadoop


    $start-all.sh
    启动过程如下图:

Hadoop运行hadoop-examples-1.2.1.jar(wordcount)-LMLPHP


注意:$jps 命令可以看到那些进程已经启动,保证 NameNode、SecondaryNameNode、DataNode 、JobTracker、TaskTracker 都正常启动。

2.2  准备数据

    创建一个本地目录input
    在input创建em1.txt、em2.txt、em3.txt、em4.txt四个文件
    
如下图:

Hadoop运行hadoop-examples-1.2.1.jar(wordcount)-LMLPHP

2.3  文件复制到hadoop中

    $hadoop dfs
    可以看到hadoop支持的shell命令

Hadoop运行hadoop-examples-1.2.1.jar(wordcount)-LMLPHP

    $hadoop dfs –mkdir input

    在hadoop创建目录 input

    $hadoop dfs –ls input

    浏览input下的文件

    $hadoop dfs –put input/* input

    把input目录下的文件从Linux中复制到hadoop中

    过程如下图:

Hadoop运行hadoop-examples-1.2.1.jar(wordcount)-LMLPHP

2.4 执行wordcount

    在hadoop的安装目录下面有hadoop-examples-1.2.1.jar,这个jar包中包含了一些在hadoop中执行的例子,hadoop支持执行jar包中的类。执行hadoop-examples-1.2.1.jar中的wordcount类的命令如下:

    $hadoop jar hadoop-examples-1.2.1.jarwordcount input output
    wordcount表示jar包中的类名,表示要执行这个类
    
input是输入文件夹
    
output是输出文件夹,必须不存在,它由程序自动创建,如果预先存在output文件夹,则会报错。
    
执行过程如下图:

Hadoop运行hadoop-examples-1.2.1.jar(wordcount)-LMLPHP

    我们可以查看output文件夹的内容来检查程序是否成功创建文件夹,通过查看output文件里面的part-r-00000文件的内容来检查程序执行结果
    
执行结果如下图:


Hadoop运行hadoop-examples-1.2.1.jar(wordcount)-LMLPHP

3  wordcount 源码

    在hadoop的安装目录下src/examples/org/apache/hadoop/examples中有很多hadoop提供的可以在hadoop上执行的类,可以找到WordCount.java,源码如下:


点击(此处)折叠或打开

  1. /**
  2.  * Licensed under the Apache License, Version 2.0 (the "License");
  3.  * you may not use this file except in compliance with the License.
  4.  * You may obtain a copy of the License at
  5.  *
  6.  * http://www.apache.org/licenses/LICENSE-2.0
  7.  *
  8.  * Unless required by applicable law or agreed to in writing, software
  9.  * distributed under the License is distributed on an "AS IS" BASIS,
  10.  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11.  * See the License for the specific language governing permissions and
  12.  * limitations under the License.
  13.  */


  14. package org.apache.hadoop.examples;

  15. import java.io.IOException;
  16. import java.util.StringTokenizer;

  17. import org.apache.hadoop.conf.Configuration;
  18. import org.apache.hadoop.fs.Path;
  19. import org.apache.hadoop.io.IntWritable;
  20. import org.apache.hadoop.io.Text;
  21. import org.apache.hadoop.mapreduce.Job;
  22. import org.apache.hadoop.mapreduce.Mapper;
  23. import org.apache.hadoop.mapreduce.Reducer;
  24. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  25. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  26. import org.apache.hadoop.util.GenericOptionsParser;

  27. public class WordCount {

  28.   public static class TokenizerMapper
  29.        extends Mapper<Object, Text, Text, IntWritable>{

  30.     private final static IntWritable one = new IntWritable(1);
  31.     private Text word = new Text();

  32.     public void map(Object key, Text value, Context context
  33.                     ) throws IOException, InterruptedException {
  34.       StringTokenizer itr = new StringTokenizer(value.toString());
  35.       while (itr.hasMoreTokens()) {
  36.         word.set(itr.nextToken());
  37.         context.write(word, one);
  38.       }
  39.     }
  40.   }

  41.   public static class IntSumReducer
  42.        extends Reducer<Text,IntWritable,Text,IntWritable> {
  43.     private IntWritable result = new IntWritable();

  44.     public void reduce(Text key, Iterable<IntWritable> values,
  45.                        Context context
  46.                        ) throws IOException, InterruptedException {
  47.       int sum = 0;
  48.       for (IntWritable val : values) {
  49.         sum += val.get();
  50.       }
  51.       result.set(sum);
  52.       context.write(key, result);
  53.     }
  54.   }

  55.   public static void main(String[] args) throws Exception {
  56.     Configuration conf = new Configuration();
  57.     String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
  58.     if (otherArgs.length != 2) {
  59.       System.err.println("Usage: wordcount ");
  60.       System.exit(2);
  61.     }
  62.     Job job = new Job(conf, "word count");
  63.     job.setJarByClass(WordCount.class);
  64.     job.setMapperClass(TokenizerMapper.class);
  65.     job.setCombinerClass(IntSumReducer.class);
  66.     job.setReducerClass(IntSumReducer.class);
  67.     job.setOutputKeyClass(Text.class);
  68.     job.setOutputValueClass(IntWritable.class);
  69.     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
  70.     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
  71.     System.exit(job.waitForCompletion(true) ? 0 : 1);
  72.   }
  73. }






09-25 15:12