使用MultipleOutputs在MapReduce中写入HBase | 使用MultipleOutputs在MapReduce中写入HB

本文介绍了使用MultipleOutputs在MapReduce中写入HBase的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前有一个使用MultipleOutputs将数据发送到多个HDFS位置的MapReduce作业。完成之后，我使用HBase客户端调用（MR之外）将一些相同的元素添加到几个HBase表中。使用TableOutputFormat将HBase输出添加为额外的MultipleOutputs会很好。就这样，我将分发我的HBase处理。

问题是，我无法让这个工作。有没有人在MultipleOutputs中使用TableOutputFormat ...？有多个HBase输出？

基本上，我设置了我的收藏家，就像这样....

  Outputcollector< ImmutableBytesWritable，Writable> hbaseCollector1 = multipleOutputs.getCollector（hbase1，记者）; 
 Outputcollector< ImmutableBytesWritable，Writable> hbaseCollector2 = multipleOutputs.getCollector（hbase2，记者）; 
 Put put = new Put（mykey.getBytes（））; 
 put.add（family.getBytes（），column.getBytes（），somedata1）; 
 hbaseCollector1.collect（NullWritable.get（），put）; 
 
 put = new Put（mykey.getBytes（））; 
 put.add（family.getBytes（），column.getBytes（），somedata2）; 
 hbaseCollector2.collect（newImmutableBytesWritable（mykey.getBytes（）），put）;

这似乎是遵循hbase写作的一般想法，我认为。

在我输入时，问题的一部分可能在作业定义中更多。看起来像MR（和Hbase）想要一个全局参数集，像这样....
conf.set（TableOutputFormat.OUTPUT_TABLE ，文章）;
提供表名称。麻烦的是，我有两个表......

有什么想法？

谢谢
解决方案
显然，这对于旧的mapred软件包来说是不可能的。 mapreduce包中有一个新的OutputFormat，但我现在不想转换为此格式。所以，我将不得不编写多个MR作业。

I currently have a MapReduce job that uses MultipleOutputs to send data to several HDFS locations. After that completes, I am using HBase client calls (outside of MR) to add some of the same elements to a few HBase tables. It would be nice to add the HBase outputs as just additional MultipleOutputs, using TableOutputFormat. In that way, I would distribute my HBase processing.
Problem is, I cannot get this to work. Has anyone ever used TableOutputFormat in MultipleOutputs...? With multiple HBase outputs?
basically, I am setting up my collectors, like this....
Outputcollector<ImmutableBytesWritable, Writable> hbaseCollector1 = multipleOutputs.getCollector("hbase1", reporter); Outputcollector<ImmutableBytesWritable, Writable> hbaseCollector2 = multipleOutputs.getCollector("hbase2", reporter); Put put = new Put(mykey.getBytes()); put.add("family".getBytes(), "column".getBytes(), somedata1); hbaseCollector1.collect(NullWritable.get(), put); put = new Put(mykey.getBytes()); put.add("family".getBytes(), "column".getBytes(), somedata2); hbaseCollector2.collect(newImmutableBytesWritable(mykey.getBytes()), put);
This seems to follow the general idea of hbase writing, I think.
Part of the issue, as I type this, might be more in the job definition. Looks like MR (and Hbase) want a global parameter set, like this....
conf.set(TableOutputFormat.OUTPUT_TABLE, "articles");
to provide the table name. Trouble is, I have two tables....
Any ideas?
Thanks
解决方案
So, apparently, this not possible with the old mapred packages. There is a new OutputFormat in the mapreduce package set, but I don't want to convert to that right now. So, I will have to write multiple MR jobs.

这篇关于使用MultipleOutputs在MapReduce中写入HBase的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！