问题描述
我们正在使用hbase版本1.1.4。 DB有大约40个表格,每个表格数据都有一个指定的TimeToLive。它部署在5个节点的集群上,以下是hbase-site.xml < property>
< name> phoenix.query.threadPoolSize< / name>
<值> 2048< /值>
< / property>
<属性>
< name> hbase.hregion.max.filesize< / name>
<值> 21474836480< /值>
< / property>
<属性>
<名称> hbase.hregion.memstore.block.multiplier< / name>
<值> 4< /值>
< / property>
<属性>
<名称> hbase.hregion.memstore.flush.size< / name>
<值> 536870912< /值>
< / property>
<属性>
< name> hbase.hstore.blockingStoreFiles< / name>
<值> 240< /值>
< / property>
<属性>
<名称> hbase.client.scanner.caching< /名称>
<值> 10000< /值>
< / property>
<属性>
<名称> hbase.bucketcache.ioengine< /名称>
<值> offheap< /值>
< / property>
<属性>
<名称> hbase.bucketcache.size< /名称>
<值> 40960< /值>
< / property>
问题在于每个区域服务器上的区域数量不断增加。目前,我们只在hbase shell中使用
merge_region合并区域。
有没有办法在每台服务器上只有固定数量的区域,合并区域?
好吧,它主要取决于您的数据:它是如何分布在键上的。假设你的值的大小与所有键的大小几乎相同,你可以使用分区:
例如,如果你的表键是 String
你想要100个区域,使用这个
public static byte [] hashKey(String key){
int partition = Math.abs(key.hashCode()%100);
字符串前缀= partitionPrefix(分区);
返回Bytes.add(Bytes.toBytes(前缀),ZERO_BYTE,key);
public static String partitionPrefix(int partition){
return StringUtils.leftPad(String.valueOf(partition),2,'0');
$ b $ p
$ b 在这种情况下,你所有的密钥都会加上00-99的数字,所以你有100个区域的100个分区。现在,您可以禁用区域分割:
HTableDescriptor td = new HTableDescriptor(TableName.valueOf(myTable));
td.setRegionSplitPolicyClassName(org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy);
或通过shell
alter'myTable',{TABLE_ATTRIBUTES => {METADATA => {'SPLIT_POLICY'=> 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'}}
We are using hbase version 1.1.4. The DB has a around 40 tables, and each table data has a TimeToLive specified. It is deployed on a 5 node cluster, and the following is the hbase-site.xml
<property>
<name>phoenix.query.threadPoolSize</name>
<value>2048</value>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>21474836480</value>
</property>
<property>
<name>hbase.hregion.memstore.block.multiplier</name>
<value>4</value>
</property>
<!-- default is 64MB 67108864 -->
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>536870912</value>
</property>
<!-- default is 7, should be at least 2x compactionThreshold -->
<property>
<name>hbase.hstore.blockingStoreFiles</name>
<value>240</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>
<property>
<name>hbase.bucketcache.ioengine</name>
<value>offheap</value>
</property>
<property>
<name>hbase.bucketcache.size</name>
<value>40960</value>
</property>
Question is that the number of regions on each of the regionservers keep growing. Currently we only merge regions using
merge_region in the hbase shell.
Is there any way to have only a fixed number of regions, on each server, or an automated way to merge the regions?
解决方案 Well it mostly depends on your data: how is it distributed across keys. Assuming your values have almost same size for all keys, you can use partitioning:
For example, if your table key is String
and you want 100 regions, use this
public static byte[] hashKey(String key) {
int partition = Math.abs(key.hashCode() % 100);
String prefix = partitionPrefix(partition);
return Bytes.add(Bytes.toBytes(prefix), ZERO_BYTE, key);
}
public static String partitionPrefix(int partition) {
return StringUtils.leftPad(String.valueOf(partition), 2, '0');
}
In this case, all you keys will be prepended with numbers 00-99, so you have 100 partitions for 100 regions. Now you can disable region splits:
HTableDescriptor td = new HTableDescriptor(TableName.valueOf("myTable"));
td.setRegionSplitPolicyClassName("org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy");
or via shell
alter 'myTable', {TABLE_ATTRIBUTES => {METADATA => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'}}
这篇关于Hbase地区数量持续增长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!