之前做的Demo太无聊了,决心改造一下~~1.  输入格式。之前的程序,StatMapper莫名其妙被输入了一堆key,value,应该是一种默认的输入格式,找了一下,原来是这个: org.apache.hadoop.mapred.InputFormatBase,  继承了InputFormat接口。接口里面有一个  FileSplit[] getSplits(FileSystem fs, JobConf job, int numSplits)    throws IOException;看来所有输入输出都必须以文件为单位存放了,就像Lucene一样。一般输入数据都是按照行来分隔的,看来一般用这个InputFormatBase就可以了。 2. 输入数据。这东东本来就是用来高效处理海量数据的,于是我想到了那iHome的ActionLog....,加起来好几百个M的,符合要求吧。这里统计一下这几天,指令被调用的次数。3. 修改程序。StatMapper.java:     public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter)            throws IOException    {        String[] token = value.toString().split(" ");        String "id"= token[6];        String act = token[7];        output.collect(new UTF8(act), new LongWritable(1));    }StatReducer.java:    public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter)            throws IOException    {        long sum = 0;        while (values.hasNext())        {            sum += ((LongWritable)values.next()).get();        }        System.out.println("Action: " + key + ", Count: " + sum);        output.collect(key, new LongWritable(sum));    }4. 运行。这回日志看清楚了:...060328 162626 E:\workground\opensource\hadoop-nightly\tmp\input\action_log.txt.2006-03-21:0+21357898060328 162626  map 8%  reduce 0%060328 162627 E:\workground\opensource\hadoop-nightly\tmp\input\action_log.txt.2006-03-21:0+21357898060328 162627  map 22%  reduce 0%060328 162628 E:\workground\opensource\hadoop-nightly\tmp\input\action_log.txt.2006-03-21:0+21357898060328 162628  map 37%  reduce 0%060328 162629 E:\workground\opensource\hadoop-nightly\tmp\input\action_log.txt.2006-03-21:0+21357898060328 162629  map 52%  reduce 0%060328 162630 E:\workground\opensource\hadoop-nightly\tmp\input\action_log.txt.2006-03-21:0+21357898060328 162631 E:\workground\opensou
09-25 09:54