OLAP可以在BigTable中完成吗？

本文介绍了OLAP可以在BigTable中完成吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在过去，我曾经使用在MySQL上运行的OLAP立方体来构建WebAnalytics。
现在，我使用OLAP多维数据集只是一个大表（好吧，它存储比这更聪明），其中每一行基本上是一个测量或一组总体测量。每个度量都有一堆维度（即哪个pagename，useragent，ip等）和一堆值（即多少个综合浏览量，多少访问者等）。

你在这样的表上运行的查询通常是以下形式（meta-SQL）：

  SELECT SUM （点击），SUM（字节），
 FROM MyCube 
 WHERE date ='20090914'和pagename ='Homepage'和浏览器！='googlebot'
 GROUP BY小时

因此，您可以使用提到的过滤器获得选定日期每小时的总计。
一个障碍是，这些立方体通常意味着全表扫描（各种原因），这意味着对尺寸的实际限制（以MiB为单位），您可以制作这些东西。

$ b $

在BigTable上运行上述查询作为mapreduce看起来很容易：
只需将'小时'作为关键字，在地图中进行过滤，然后通过合计值来减少。

您可以像上面显示的那样运行查询（或至少（即通过用户界面，用户尽快得到他们的答案），而不是批量模式？

如果在BigTable类型的系统上实时不;什么是在BigTable / Hadoop / HBase / Hive等领域采取类似行动的适当技术？
它甚至已经完成（种）。

LastFm的汇总/汇总引擎：

谷歌搜索出现了一个谷歌代码项目mroll，但它没有除联系信息外没有任何内容（没有代码，没有任何内容）。不过，可能想要与那个人联系，看看有什么问题。

In the past I used to build WebAnalytics using OLAP cubes running on MySQL.Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip, etc.) and a bunch of values (i.e. how many pageviews, how many visitors, etc.).
The queries that you run on a table like this are usually of the form (meta-SQL):
SELECT SUM(hits), SUM(bytes), FROM MyCube WHERE date='20090914' and pagename='Homepage' and browser!='googlebot' GROUP BY hour
So you get the totals for each hour of the selected day with the mentioned filters.One snag was that these cubes usually meant a full table scan (various reasons) and this meant a practical limitation on the size (in MiB) you could make these things.
I'm currently learning the ins and outs of Hadoop and the likes.
Running the above query as a mapreduce on a BigTable looks easy enough:Simply make 'hour' the key, filter in the map and reduce by summing the values.
Can you run a query like I showed above (or at least with the same output) on a BigTable kind of system in 'real time' (i.e. via a user interface and the user get's their answer ASAP) instead of batch mode?
If not; what is the appropriate technology to do something like this in the realm of BigTable/Hadoop/HBase/Hive and the likes?
解决方案
It's even kind of been done (kind of).
LastFm's aggregation/summary engine: http://github.com/zohmg/zohmg
A google search turned up a google code project "mroll" but it doesn't have anything except contact info (no code, nothing). Still, might want to reach out to that guy and see what's up. http://code.google.com/p/mroll/

这篇关于OLAP可以在BigTable中完成吗？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

How

OLAP可以在BigTable中完成吗？

问题描述