使用MultipleOutputs在MapReduce中写入HB

使用MultipleOutputs在MapReduce中写入HB

本文介绍了使用MultipleOutputs在MapReduce中写入HBase的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有一个使用MultipleOutputs将数据发送到多个HDFS位置的MapReduce作业。完成之后,我使用HBase客户端调用(MR之外)将一些相同的元素添加到几个HBase表中。使用TableOutputFormat将HBase输出添加为额外的MultipleOutputs会很好。就这样,我将分发我的HBase处理。



问题是,我无法让这个工作。有没有人在MultipleOutputs中使用TableOutputFormat ...?有多个HBase输出?



基本上,我设置了我的收藏家,就像这样....

  Outputcollector< ImmutableBytesWritable,Writable> hbaseCollector1 = multipleOutputs.getCollector(hbase1,记者); 
Outputcollector< ImmutableBytesWritable,Writable> hbaseCollector2 = multipleOutputs.getCollector(hbase2,记者);
Put put = new Put(mykey.getBytes());
put.add(family.getBytes(),column.getBytes(),somedata1);
hbaseCollector1.collect(NullWritable.get(),put);

put = new Put(mykey.getBytes());
put.add(family.getBytes(),column.getBytes(),somedata2);
hbaseCollector2.collect(newImmutableBytesWritable(mykey.getBytes()),put);

这似乎是遵循hbase写作的一般想法,我认为。



在我输入时,问题的一部分可能在作业定义中更多。看起来像MR(和Hbase)想要一个全局参数集,像这样....

  conf.set(TableOutputFormat.OUTPUT_TABLE ,文章); 

提供表名称。麻烦的是,我有两个表......



有什么想法?

谢谢

解决方案

显然,这对于旧的mapred软件包来说是不可能的。 mapreduce包中有一个新的OutputFormat,但我现在不想转换为此格式。所以,我将不得不编写多个MR作业。


I currently have a MapReduce job that uses MultipleOutputs to send data to several HDFS locations. After that completes, I am using HBase client calls (outside of MR) to add some of the same elements to a few HBase tables. It would be nice to add the HBase outputs as just additional MultipleOutputs, using TableOutputFormat. In that way, I would distribute my HBase processing.

Problem is, I cannot get this to work. Has anyone ever used TableOutputFormat in MultipleOutputs...? With multiple HBase outputs?

basically, I am setting up my collectors, like this....

Outputcollector<ImmutableBytesWritable, Writable> hbaseCollector1 = multipleOutputs.getCollector("hbase1", reporter);
Outputcollector<ImmutableBytesWritable, Writable> hbaseCollector2 = multipleOutputs.getCollector("hbase2", reporter);
Put put = new Put(mykey.getBytes());
put.add("family".getBytes(), "column".getBytes(), somedata1);
hbaseCollector1.collect(NullWritable.get(), put);

put = new Put(mykey.getBytes());
put.add("family".getBytes(), "column".getBytes(), somedata2);
hbaseCollector2.collect(newImmutableBytesWritable(mykey.getBytes()), put);

This seems to follow the general idea of hbase writing, I think.

Part of the issue, as I type this, might be more in the job definition. Looks like MR (and Hbase) want a global parameter set, like this....

conf.set(TableOutputFormat.OUTPUT_TABLE, "articles");

to provide the table name. Trouble is, I have two tables....

Any ideas?

Thanks

解决方案

So, apparently, this not possible with the old mapred packages. There is a new OutputFormat in the mapreduce package set, but I don't want to convert to that right now. So, I will have to write multiple MR jobs.

这篇关于使用MultipleOutputs在MapReduce中写入HBase的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 15:44
查看更多