问题描述
我试图预分割hbase表。一个HbaseAdmin的java api是创建一个hbase表的函数,具有startkey,endkey和区域数量。这里是我从HbaseAdmin使用的java api void createTable(HTableDescriptor desc,byte [] startKey,byte [] endKey,int numRegions) 是否有任何关于基于数据集选择startkey和endkey的建议?
我的方法是让我们说有100条记录在数据集中。我希望将数据分成大约10个地区,每个地区大约有10条记录。所以要找到startkey我会说 scan'/ mytable',{LIMIT => 10} ,然后选择最后一行作为我的startkey,然后选择 scan'/ mytable',{LIMIT => 90} 并选择最后一个rowkey作为我的endkey。
这种方法能够找到startkey和rowkey看起来不错,还是有更好的做法?
编辑
我尝试了以下方法来预分割空表。 ALl三人没有按照我使用它的方式工作。我想我将需要盐的关键得到平等分配。
PS>我只显示一些区域信息
1)
byte [] [] splits = new RegionSplitter.HexStringSplit()。split(10);
hBaseAdmin.createTable(tabledescriptor,splits);
这给区域带来了如下界限:
<$
startkey: - INFINITY,
endkey:11111111,
numberofrows:3628951,
},
{
startkey:11111111,
endkey:22222222,
},
{
startkey: 22222222,
endkey:33333333,
},
{
startkey:33333333,
endkey:44444444 ,
},
{
startkey:88888888,
endkey:99999999,
},
{
startkey:99999999,
endkey:aaaaaaaa,
},
{
startkey:aaaaaaaa,
endkey :bbbbbbbb,
},
{
startkey:eeeeeeee,
endkey:INFINITY,
}
这是没用的,因为我的行键是复合形式,如'deptId | month | roleId | regionId '并且不符合上述限制。
2)
byte [] [] splits = new 。RegionSplitter.UniformSplit()分割(10);
hBaseAdmin.createTable(tabledescriptor,split)
这有同样的问题:
{
startkey: - INFINITY,
endkey:\\x19\ \x99\\x99\\x99\\\\\\\\\\\\\\\\\\\\' startkey:\\x19\\\x99\\\x99\\x99\\x99\\x99\\x99\\\
endkey :33333332,
}
{
startkey:33333332,
endkey:L\\xCC\\xCC\\\ \\ xcc \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'
endkey:INFINITY,
}
hBaseAdmin.createTable(tabledescriptor,Bytes.toBytes(04120 | 200808 | 805 | 1999),
Bytes.toBytes(01253 | 201501 | 805 | 1999),10);
{
startkey: - INFINITY,
endkey:04120 | 200808 | 805 | 1999,
}
{
startkey:04120 | 200808 | 805 | 1999,
endkey:000PTP \\\xDC200W\\\ \\ xD07 \\\x9C805 | 1999,
}
{
startkey:000PTP \\\xDC200W\\\xD07\\x9C805 | 1999,
endkey:000ptq }
{
startkey:001\\x11\\ x15 \\x13\\\\\\\\\\\\\\\\\\\\\\\\'×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
startkey:01253 | 201501 | 805 | 1999,
endkey:INFINITY,
}
$第一个问题:出于我对hbase的经验,我不知道有任何硬性规则可以用来解决问题。使用开始键和结束键创建区域数量。
但是基本的东西是,
However, if you define fixed number of regions as you mentioned 10. There may not be 10 after heavy data load. If it reaches, certain limit, number of regions will again split.
In your way of creating table with hbase admin documentation says, Creates a new table with the specified number of regions. The start key specified will become the end key of the first region of the table, and the end key specified will become the start key of the last region of the table (the first region has a null start key and the last region has a null end key).
Moreover, I prefer creating a table through script with presplits say 0-10 and I will design a rowkey such that its salted and it will be sitting on one of region servers to avoid hotspotting. like
EDIT : If you want to implement own regionSplit you can implement and provide your own implementation org.apache.hadoop.hbase.util.RegionSplitter.SplitAlgorithm and override
public byte[][] split(int numberOfSplits)Second question : My understanding : You want to find startrowkey and end rowkey for the inserted data in a specific table... below are the ways.
If you want to find start and end rowkeys scan '.meta' table to understand how is your start rowkey and end rowkey..
you can access ui http://hbasemaster:60010 if you can see how the rowkeys are spread across each region. for each region start and rowkeys will be there.
to know how your keys are organized, after pre splitting your table and inserting in to hbase... use FirstKeyOnlyFilter
for example : scan 'yourtablename', FILTER => 'FirstKeyOnlyFilter()'which displays all your 100 rowkeys.
if you have huge data (not 100 rows as you mentioned) and want to take a dump of all rowkeys then you can use below from out side shell..
echo "scan 'yourtablename', FILTER => 'FirstKeyOnlyFilter()'" | hbase shell > rowkeys.txt
这篇关于hbase如何选择预分割策略以及它如何影响你的rowkeys的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!