问题描述
我目前有一个使用MultipleOutputs将数据发送到多个HDFS位置的MapReduce作业。完成之后,我使用HBase客户端调用(MR之外)将一些相同的元素添加到几个HBase表中。使用TableOutputFormat将HBase输出添加为额外的MultipleOutputs会很好。就这样,我将分发我的HBase处理。
问题是,我无法让这个工作。有没有人在MultipleOutputs中使用TableOutputFormat ...?有多个HBase输出?
基本上,我设置了我的收藏家,就像这样....
Outputcollector< ImmutableBytesWritable,Writable> hbaseCollector1 = multipleOutputs.getCollector(hbase1,记者);
Outputcollector< ImmutableBytesWritable,Writable> hbaseCollector2 = multipleOutputs.getCollector(hbase2,记者);
Put put = new Put(mykey.getBytes());
put.add(family.getBytes(),column.getBytes(),somedata1);
hbaseCollector1.collect(NullWritable.get(),put);
put = new Put(mykey.getBytes());
put.add(family.getBytes(),column.getBytes(),somedata2);
hbaseCollector2.collect(newImmutableBytesWritable(mykey.getBytes()),put);
这似乎是遵循hbase写作的一般想法,我认为。
在我输入时,问题的一部分可能在作业定义中更多。看起来像MR(和Hbase)想要一个全局参数集,像这样....
conf.set(TableOutputFormat.OUTPUT_TABLE ,文章);
提供表名称。麻烦的是,我有两个表......
有什么想法?
谢谢
显然,这对于旧的mapred软件包来说是不可能的。 mapreduce包中有一个新的OutputFormat,但我现在不想转换为此格式。所以,我将不得不编写多个MR作业。
I currently have a MapReduce job that uses MultipleOutputs to send data to several HDFS locations. After that completes, I am using HBase client calls (outside of MR) to add some of the same elements to a few HBase tables. It would be nice to add the HBase outputs as just additional MultipleOutputs, using TableOutputFormat. In that way, I would distribute my HBase processing.
Problem is, I cannot get this to work. Has anyone ever used TableOutputFormat in MultipleOutputs...? With multiple HBase outputs?
basically, I am setting up my collectors, like this....
Outputcollector<ImmutableBytesWritable, Writable> hbaseCollector1 = multipleOutputs.getCollector("hbase1", reporter);
Outputcollector<ImmutableBytesWritable, Writable> hbaseCollector2 = multipleOutputs.getCollector("hbase2", reporter);
Put put = new Put(mykey.getBytes());
put.add("family".getBytes(), "column".getBytes(), somedata1);
hbaseCollector1.collect(NullWritable.get(), put);
put = new Put(mykey.getBytes());
put.add("family".getBytes(), "column".getBytes(), somedata2);
hbaseCollector2.collect(newImmutableBytesWritable(mykey.getBytes()), put);
This seems to follow the general idea of hbase writing, I think.
Part of the issue, as I type this, might be more in the job definition. Looks like MR (and Hbase) want a global parameter set, like this....
conf.set(TableOutputFormat.OUTPUT_TABLE, "articles");
to provide the table name. Trouble is, I have two tables....
Any ideas?
Thanks
So, apparently, this not possible with the old mapred packages. There is a new OutputFormat in the mapreduce package set, but I don't want to convert to that right now. So, I will have to write multiple MR jobs.
这篇关于使用MultipleOutputs在MapReduce中写入HBase的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!