Apache HBase Performance Tuning

RAM, RAM, RAM. 不要让HBase饿死.

请使用64位的平台

必须将swapping设定为0

使用本地硬件来完成hdfs的checksumming计算，见：https://blogs.apache.org/hbase/entry/saving_cpu_using_native_hadoop

老年代使用CMS垃圾算法，设置-XX:CMSInitiatingOccupancyFraction为60或者70（越小代表发生越多GC，CPU也会消耗越多）
年轻代使用UseParNewGC算法
使用MSLAB机制来放置memstore带来的内存碎片，将base.hregion.memstore.mslab.enabled设置为true即可，92之后的版本都是默认true的。
HBASE-8163单子介绍了MSLAB池的机制，可以更加有效的使用MSLAB
除了8613单中提到的机制外也可以使用：XX:PretenureSizeThreshold设置的大小比hbase.hregion.memstore.mslab.chunksize大这样MSLAB的块将直接在老年去产生，避免不必要的年轻代拷贝升级
其他关于一般的Java的GC可以参考Eliminating Large JVM GC Pauses Caused by Background IO Traffic
重要配置

hbase.master.wait.on.regionservers.mintostart	大集群环境下增大此配置以防止region被分发到少数几台RS上
`zookeeper.session.timeout`	默认3分钟，在JVM调优的情况下可以减少，宕机是可以尽处理宕机机器
`dfs.datanode.failed.volumes.tolerated`	数据卷的损坏情况，这是一个hdfs的配置，默认为0当 dfs.datanode.data.dir下面的任何卷的读写失败都会造成datanode的宕机所以建议将此值设定为卷数的一半
`hbase.regionserver.handler.count`	这是服务端相应客户端请求的线程处理数，一般根据客户端的情况，如客户端每次都将大数据put或者scan服务器，那么需要设置的小一点，如果每次交互数据量较小则可以提高此参数，增加处理性能。
hbase.ipc.server.max.callqueue.size	q请求队列，在纯写的情况可以增大，当有写负载的时候需要主要过大的配置有可能带来OOM群体。

启用ColumnFamily的压缩
将WAL的文件大小设置为小于hdfs的块大小，并且最大wal文件数可以根据 (RS heap * memstore factor )/ wal size
在对业务很了解的情况下可以关闭自动分裂，改为手动分裂，可以将hbase.hregion.max.filesize设置为一个超大值，比如100G但是不建议设置为无限大。
对于与分裂region可以建议每台RS有10个与分裂region
手动控制major cpmpaction来减轻业务压力
在HBase纸上做MR任务的时候请关闭推测执行特性，将mapreduce.map.speculative and mapreduce.reduce.speculative设置为false
配置中将ipc.server.tcpnodelay ==> true
hbase.ipc.client.tcpnodelay ==> true 减少RPC延迟

MTTR设定：

Set the following in the RegionServer.

<property>

  <name>hbase.lease.recovery.dfs.timeout</name>

  <value>23000</value>

  <description>How much time we allow elapse between calls to recover lease.

  Should be larger than the dfs timeout.</description>

</property>

<property>

  <name>dfs.client.socket-timeout</name>

  <value>10000</value>

  <description>Down the DFS timeout from 60 to 10 seconds.</description>

</property>

And on the NameNode/DataNode side, set the following to enable 'staleness' introduced in HDFS-3703, HDFS-3912.

<property>

  <name>dfs.client.socket-timeout</name>

  <value>10000</value>

  <description>Down the DFS timeout from 60 to 10 seconds.</description>

</property>

<property>

  <name>dfs.datanode.socket.write.timeout</name>

  <value>10000</value>

  <description>Down the DFS timeout from 8 * 60 to 10 seconds.</description>

</property>

<property>

  <name>ipc.client.connect.timeout</name>

  <value>3000</value>

  <description>Down from 60 seconds to 3.</description>

</property>

<property>

  <name>ipc.client.connect.max.retries.on.timeouts</name>

  <value>2</value>

  <description>Down from 45 seconds to 3 (2 == 3 retries).</description>

</property>

<property>

  <name>dfs.namenode.avoid.read.stale.datanode</name>

  <value>true</value>

  <description>Enable stale state in hdfs</description>

</property>

<property>

  <name>dfs.namenode.stale.datanode.interval</name>

  <value>20000</value>

  <description>Down from default 30 seconds</description>

</property>

<property>

  <name>dfs.namenode.avoid.write.stale.datanode</name>

  <value>true</value>

  <description>Enable stale state in hdfs</description>

</property>

MSLAB

Apache HBase Performance Tuning 官文总结

Apache HBase Performance Tuning

`zookeeper.session.timeout`

`dfs.datanode.failed.volumes.tolerated`

`hbase.regionserver.handler.count`