问题描述
有人知道 Google Analytics 中的数据是如何组织的吗?海量数据难选,执行速度非常快,是什么结构的数据库?
Does anybody know how data in Google Analytics is organized? Difficult selection from large amounts of data they perform very-very fast, what structure of database is it?
推荐答案
AFAIK Google Analytics 源自 Urchin.如前所述,有可能因为现在 Analytics 是 Google 家族的一部分,它正在使用 MapReduce/BigTable.我可以假设 Google 已经将 Urchin DB 的旧格式与新的 BigTable/MapReduce 进行了集成.
AFAIK Google Analytics is derived from Urchin. As it has been said it is possible that since now Analytics is part of the Google family it is using MapReduce/BigTable. I can assume that Google had integrated the old format of Urchin DB with the new BigTable/MapReduce.
我找到了这个关于 Urchin DB 的链接.可能有些东西现在还在用.
I found this links which talk about Urchin DB. Probably some of the things are still in use at the moment.
http://www.advanced-web-metrics.com/blog/2007/10/16/what-is-urchin/
这说:
[snip] ...仍然使用专有数据库来存储报告数据,这使得临时查询更加受限,因为您必须使用 Urchin 开发的工具而不是更灵活的 SQL 工具.
http://www.urchinexperts.com/software/faq/#ques45
Urchin 使用什么类型的数据库?
Urchin 使用专有的平面文件数据库来存储报告数据.高性能数据库架构可以有效地处理非常高流量的站点.数据库架构的一些好处包括:
Urchin uses a proprietary flat file database for report data storage. The high-performance database architecture handles very high traffic sites efficiently. Some of the benefits of the data base architecture include:
* Small database footprint approximately 5-10% of raw logfile size
* Small number of database files required per profile (9 per month of historical reporting)
* Support for parallel processing of load-balanced webserver logs for increased performance
* Databases are standard files that are easy to back up and restore using native operating system utilitiesv
有关 Urchin 的更多信息
More info about Urchin
http://www.google.com/support/urchin45/bin/answer.py?answer=28737
很久以前我曾经有一个跟踪器,他们在他们的网站上讨论了数据规范化:http://www.2enetwrx.com/dev/articles/statisticus5.asp
Long time ago I used to have a tracker and on their site they were discussing about data normalization: http://www.2enetworx.com/dev/articles/statisticus5.asp
您可以在那里找到一些有关如何减少 DB 中数据的信息,这也许是研究的一个良好开端.
There you can find a bit of info of how to reduce the data in DB and maybe it is a good start in research.
这篇关于谷歌分析数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!