选择Hadoop,低成本和高扩展性是主要原因,但但它的开发效率实在无法让人满意。     以关联计算为例。     假设:HDFS上有2个文件,分别是客户信息和订单信息,customerID是它们之间的关联字段。如何进行关联计算,以便将客户名称添加到订单列表中?     一般方法是:输入2个源文件。根据文件名在Map中处理每条数据,如果是Order,则在foreign key上加标记”O”,形成combined key;如果是Customer则做标记”C”。Map之后的数据按照key分区,再按照combined key分组排序。最后在reduce中合并结果再输出。 实现代码: public static class JMapper extends Mapper {     //mark every row with "O" or "C" according to file name     @Override     protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {     String pathName = ((FileSplit) context.getInputSplit()).getPath().toString();     if (pathName.contains("order.txt")) {//identify order by file name             String values[] = value.toString().split("\t");             TextPair tp = new TextPair(new Text(values[1]), new Text("O"));//mark with "O"             context.write(tp, new Text(values[0] + "\t" + values[2]));         }    if (pathName.contains("customer.txt")) {//identify customer by file name            String values[] = value.toString().split("\t");            TextPair tp = new TextPair(new Text(values[0]), new Text("C"));//mark with "C"            context.write(tp, new Text(values[1]));         }     } } public static class JPartitioner extends Partitioner {     //partition by key, i.e. customerID     @Override     public int getPartition(TextPair key, Text value, int numParititon) {         return Math.abs(key.getFirst().hashCode() * 127) % numParititon;     } } public static class JComparator extends WritableComparator {     //group by muti-key     public JComparator() {         super(TextPair.class, true);     }     @SuppressWarnings("unchecked")     public int compare(WritableComparable a, WritableComparable b) {         TextPair t1 = (TextPair) a;         TextPair t2 = (TextPair) b;         return t1.getFirst().compareTo(t2.getFirst());     } } public static class JReduce extends Reducer {     //merge and output     protected void reduce(TextPair key, Iterable values, Context context) throws IOException,InterruptedException {     Text pid = key.getFirst();     String desc = values.iterator().next().toString();     while (values.iterator().hasNext()) {         context.write(pid, new Text(values.iterator().next().toString() + "\t" + desc));    }     } } public class TextPair implements WritableComparable {     //make muti-key     private Text first;     private Text second;     public TextPair() {         set(new Text(), new Text());     }     public TextPair(String first, String second) {         set(new Text(first), new Text(second));     }     public TextPair(Text first, Text second) {         set(first, second);     }     public void set(Text first, Text second) {   this.first = first;   this.second = second;     }     public Text getFirst() {   return first;     }     public Text getSecond() {   return second;     }     public void write(DataOutput out) throws IOException {   first.write(out);   second.write(out);     }     public void readFields(DataInput in) throws IOException {   first.readFields(in);   second.readFields(in);     }     public int compareTo(TextPair tp) {   int cmp = first.compareTo(tp.first);   if (cmp != 0) {        return cmp;   }     return second.compareTo(tp.second);     } } public static void main(String agrs[]) throws IOException, InterruptedException, ClassNotFoundException {     //job entrance     Configuration conf = new Configuration();     GenericOptionsParser parser = new GenericOptionsParser(conf, agrs);     String[] otherArgs = parser.getRemainingArgs();     if (agrs.length   System.err.println("Usage: J ");    System.exit(2);     }     Job job = new Job(conf, "J");     job.setJarByClass(J.class);//Join class     job.setMapperClass(JMapper.class);//Map class     job.setMapOutputKeyClass(TextPair.class);//Map output key class     job.setMapOutputValueClass(Text.class);//Map output value class     job.setPartitionerClass(JPartitioner.class);//partition class     job.setGroupingComparatorClass(JComparator.class);//condition group class after partition     job.setReducerClass(Example_Join_01_Reduce.class);//reduce class     job.setOutputKeyClass(Text.class);//reduce output key class     job.setOutputValueClass(Text.class);//reduce ouput value class     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));//one of source files     FileInputFormat.addInputPath(job, new Path(otherArgs[1]));//another file     FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));//output path     System.exit(job.waitForCompletion(true) ? 0 : 1);//run untill job ends }     不能直接使用原始数据,而是要搞一堆代码处理标记,并绕过MapReduce原本的架构,最后从底层设计并计算数据之间的关联关系。这还是最简单的关联计算,如果用MapReduce进行多表关联或逻辑更复杂的关联计算,复杂度会呈几何级数递增。 转自:http://hi.baidu.com/rwvzjwhehncntye/item/da8cdcf335e40b2dfe3582db意外搜到另一篇相同主题的文章,不知道是否软文,开卷有益吧:http://blog.sina.com.cn/s/blog_e4de31d00101efat.html
10-09 04:04