问题描述
现在我像这样在 ResultScanner
上实现行计数
Right now I implement row count over ResultScanner
like this
for (Result rs = scanner.next(); rs != null; rs = scanner.next()) {
number++;
}
如果数据达到百万次计算量很大,我想实时计算不想用Mapreduce
If data reaching millions time computing is large.I want to compute in real time that i don't want to use Mapreduce
如何快速计算行数.
推荐答案
在 HBase 中使用 RowCounterRowCounter 是一个 mapreduce 作业,用于计算表的所有行.这是一个很好的实用程序,可用作健全性检查,以确保在存在元数据不一致问题时 HBase 可以读取表的所有块.它会在一个进程中运行所有的 mapreduce,但如果你有一个 MapReduce 集群来利用它,它会运行得更快.
Use RowCounter in HBaseRowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit.
$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>
Usage: RowCounter [options]
<tablename> [
--starttime=[start]
--endtime=[end]
[--range=[startKey],[endKey]]
[<column1> <column2>...]
]
这篇关于Hbase 快速统计行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!