问题描述
在过去,我曾经使用在MySQL上运行的OLAP立方体来构建WebAnalytics。
现在,我使用OLAP多维数据集只是一个大表(好吧,它存储比这更聪明),其中每一行基本上是一个测量或一组总体测量。每个度量都有一堆维度(即哪个pagename,useragent,ip等)和一堆值(即多少个综合浏览量,多少访问者等)。
你在这样的表上运行的查询通常是以下形式(meta-SQL):
SELECT SUM (点击),SUM(字节),
FROM MyCube
WHERE date ='20090914'和pagename ='Homepage'和浏览器!='googlebot'
GROUP BY小时
因此,您可以使用提到的过滤器获得选定日期每小时的总计。
一个障碍是,这些立方体通常意味着全表扫描(各种原因),这意味着对尺寸的实际限制(以MiB为单位),您可以制作这些东西。
$ b $
在BigTable上运行上述查询作为mapreduce看起来很容易:
只需将'小时'作为关键字,在地图中进行过滤,然后通过合计值来减少。
您可以像上面显示的那样运行查询(或至少(即通过用户界面,用户尽快得到他们的答案),而不是批量模式?
如果在BigTable类型的系统上实时不;什么是在BigTable / Hadoop / HBase / Hive等领域采取类似行动的适当技术?
它甚至已经完成(种)。LastFm的汇总/汇总引擎:
谷歌搜索出现了一个谷歌代码项目mroll,但它没有除联系信息外没有任何内容(没有代码,没有任何内容)。不过,可能想要与那个人联系,看看有什么问题。
In the past I used to build WebAnalytics using OLAP cubes running on MySQL.Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip, etc.) and a bunch of values (i.e. how many pageviews, how many visitors, etc.).
The queries that you run on a table like this are usually of the form (meta-SQL):
SELECT SUM(hits), SUM(bytes),
FROM MyCube
WHERE date='20090914' and pagename='Homepage' and browser!='googlebot'
GROUP BY hour
So you get the totals for each hour of the selected day with the mentioned filters.One snag was that these cubes usually meant a full table scan (various reasons) and this meant a practical limitation on the size (in MiB) you could make these things.
I'm currently learning the ins and outs of Hadoop and the likes.
Running the above query as a mapreduce on a BigTable looks easy enough:Simply make 'hour' the key, filter in the map and reduce by summing the values.
Can you run a query like I showed above (or at least with the same output) on a BigTable kind of system in 'real time' (i.e. via a user interface and the user get's their answer ASAP) instead of batch mode?
If not; what is the appropriate technology to do something like this in the realm of BigTable/Hadoop/HBase/Hive and the likes?
It's even kind of been done (kind of).
LastFm's aggregation/summary engine: http://github.com/zohmg/zohmg
A google search turned up a google code project "mroll" but it doesn't have anything except contact info (no code, nothing). Still, might want to reach out to that guy and see what's up. http://code.google.com/p/mroll/
这篇关于OLAP可以在BigTable中完成吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!