问题描述
最近我学习了hbase协处理器,我用endpoint来累计hbase表的一列。例如,名为pendings的hbase表,其族是asset,我累计了asset:amount的所有值。该表还有其他栏,例如asset:customer_name。我想要做的第一件事是通过asset:customer_name来累计asset:amount组的值。但是我发现groupby没有API,或者我没有找到它。你知道如何实现GROUPBY或者如何使用HBASE提供的API吗?您应该使用端点来执行此操作工作。
您在本文中有一个总和示例:。
您基本上需要添加的是追加行键和客户名称以形成您的新密钥MyKey。你应该保留最后一次看到的MyKey的变量,并且当前的MyKey与前一个不同时,你应该发出前一个和它的总和并覆盖前一个MyKey到当前的一个。
您必须确保在客户端执行汇总,因为您可能在两个不同区域的边缘有一个客户,因为它在URL中提供的示例中完成。
Recently I learned hbase coprocessor, I used endpoint to accumulate one column of hbase table.For example, the hbase table named "pendings",its family is "asset", I accumulate all the value of "asset:amount". The table has other columns,such as "asset:customer_name". The first thing I want to do is accumulate the the value of "asset:amount" group by "asset:customer_name". But I found there is not API for groupby, or I did not find it. Do you know how to implement GROUPBY or how to use the API that HBASE provides?
You should use an endpoint to do this work.
You have a sum example in this article: https://blogs.apache.org/hbase/entry/coprocessor_introduction.
What you basically need to add is to append your row key and the customer name to form your new key "MyKey". You should keep a variable of the last seen MyKey and when the current MyKey is different from the previous one, you should emit the previous one along with its sum and overwrite the previous MyKey to the current one.
You have to make sure to perform the aggregation on the client side as it is done in the example provided in the URL because you may have a customer at the edges of two different regions.
这篇关于如何使用hbase协处理器来实现groupby?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!