在单个作业中批量加载到多个HBase表

本文介绍了在单个作业中批量加载到多个HBase表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用一个mapreduce作业将数据批量加载到多个表中。由于数据量很高，因此在遍历数据集两次并使用多个作业加载时会非常耗时。是否有任何方法可以执行此操作？提前致谢。

解决方案

我正在使用Hbase。但我并不需要批量加载。但我偶然发现这篇文章可能会对你有帮助。

批量加载功能使用以HBase内部数据格式输出表格数据的MapReduce作业，然后直接将生成的StoreFiles加载到正在运行的集群中。使用批量加载将比使用HBase API使用更少的CPU和网络资源。

I want to bulk load data to mulitple tables using a single mapreduce job.Since the data volumes is high ,It would be time consuming to iterate through dataset twice and load using multiple jobs.Is there any way to do this ? Thanks in advance.

解决方案

I am using Hbase. But i didnt need bulk load yet. But I came across this article which might help you.

http://hbase.apache.org/book/arch.bulk.load.html

The bulk load feature uses a MapReduce job to output table data in HBase's internal data format, and then directly loads the generated StoreFiles into a running cluster. Using bulk load will use less CPU and network resources than simply using the HBase API.

这篇关于在单个作业中批量加载到多个HBase表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！