问题描述
我正在处理一个批处理作业,以通过HTableInterface将一批Put对象处理到HBase中. API有两种方法,即HTableInterface.put(List)和HTableInterface.put(Put).
I am working on a batch job to process a batch of Put objects into HBase through HTableInterface. There are two API methods, HTableInterface.put(List) and HTableInterface.put(Put).
我想知道,对于相同数量的Put对象,批次的放置速度是否比逐个放置它们的速度快?
I am wondering, for the same number of Put objects, is the batch put faster than putting them one by one?
另一个问题是,我正在放置一个非常大的Put对象,这导致作业失败.放置对象的大小似乎受到限制.可以有多大?
Another question is, I am putting a very large Put object, which caused the job to fail. There seems a limit on the size of a Put object. How large can it be?
推荐答案
put(List<Put> puts)
或put(Put aPut)
完全相同.他们都叫doPut(List<Put> puts)
.
put(List<Put> puts)
or put(Put aPut)
are the same under the hood. They both call doPut(List<Put> puts)
.
重要的是@ozhang提到的缓冲区大小.例如默认值为2MB.
What matters is the buffer size as mentioned by @ozhang. e.g. The default value is 2MB.
<property>
<name>hbase.client.write.buffer</name>
<value>2097152</value>
</property>
每次写缓冲区被填满并触发flushCommits()
时,将有1个RPC.因此,如果由于对象比较大而使应用程序经常刷新,尝试增加写缓冲区大小将解决此问题.
There will be 1 RPC every time the write buffer is filled up and a flushCommits()
is triggered. So if your application is flushing to often because your objects are relatively big, experimenting with increasing the write buffer size will solve the problem.
这篇关于HBase批处理put(List< Put>)是否比put(Put)更快?放置对象的容量是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!