问题描述
背景/意图:
因此,我将从头开始创建事件跟踪器,并对如何执行操作有一些想法,但是我不确定继续进行数据库操作的最佳方法.我感兴趣的一件事是允许这些事件完全动态,但同时又允许报告相关事件计数器.
So I'm going to create an event tracker from scratch and have a couple of ideas on how to do this but I'm unsure of the best way to proceed with the database side of things. One thing I am interested in doing is allowing these events to be completely dynamic, but at the same time to allow for reporting on relational event counters.
例如,所有国家/地区都按操作系统细分.预期的效果将是:
For example, all countries broken down by operating systems. The desired effect would be:
- 美国的事件数
- iOS-在美国发生的事件数
- Android-在美国发生的事件数
- US # of events
- iOS - # of events that occured in US
- Android - # of events that occured in US
- iOS-CA中发生的事件数
- Android-CA中发生的事件数
我的目的是能够接受这些事件名称,如下所示:
My intent is to be able to accept these event names like so:
/?country=US&os=iOS&device=iPhone&color=blue&carrier=Sprint&city=orlando&state=FL&randomParam=123&randomParam2=456&randomParam3=789
这意味着要为上述内容做关系计数器,我可能会为每个请求增加100个以上的计数器.
Which means in order to do the relational counters for something like the above I would potentially be incrementing 100+ counters per request.
假设每天有10+百万个以上请求.
Assume there will be 10+ million of the above requests per day.
我希望就事件名称而言,使事情保持完全动态,并且我也想以一种使数据查找保持超快的方式进行操作.因此,我一直在考虑为此使用redis或mongodb.
I want to keep things completely dynamic in terms of the event names being tracked and I also want to do it in such a manner that the lookups on the data remains super quick. As such I have been looking into using redis or mongodb for this.
问题:
-
在保持字段动态的同时,有没有比计数器更好的方法了?
Is there a better way to do this then counters while keeping the fields dynamic?
提供的全部内容都在一个文档中(结构像一棵树),在mongodb中使用$ inc运算符在一个操作中同时增加100个以上的计数器是否可行并且不慢?这样做的好处是,我可以在单个查询中快速检索一个广告系列"的所有统计信息.
Provided this was all in one document (structured like a tree), would using the $inc operator in mongodb to increment 100+ counters at the same time in one operation be viable and not slow? The upside here being I can retrieve all of the statistics for one 'campaign' quickly in a single query.
这是否更适合Redis并为该事件的所有适用计数器做一次zinrby?
Would this be better suited to redis and to do a zincrby for all of the applicable counters for the event?
谢谢
推荐答案
根据您密钥结构的布局方式,我建议对zincr命令进行流水线处理.您有一个简单的提交"触发器-请求.如果要遍历参数并为每个键添加锌,那么在请求结束时传递execute命令将非常快.我已经实现了一个像您描述为cgi和Django应用程序的系统.我按照以下步骤建立了一个关键结构:
Depending on how your key structure is laid out I would recommend pipelining the zincr commands. You have an easy "commit" trigger - the request. If you were to iterate over your parameters and zincr each key, then at the end of the request pass the execute command it will be very fast. I've implemented a system like you describe as both a cgi and a Django app. I set up a key structure along the lines of this:
YYYY-MM-DD:HH:MM->排序集
YYYY-MM-DD:HH:MM -> sorted set
并且能够通过单个过程以每秒150000-200000的增量在Redis端进行处理,这对于您所描述的场景来说应该足够了.这种密钥结构使我能够基于时间窗口获取数据.我还为密钥添加了一个到期时间,以避免编写数据库清理过程.然后,我进行了一项cronjob,它将使用上述键模式的变体来设置操作以每小时",每天"和每周"汇总"统计数据.我提出这些想法是因为它们是您可以利用Redis的内置功能来简化报告方的方式.还有其他方法可以执行此操作,但是这种模式似乎效果很好.
And was able to process Something like 150000-200000 increments per second on the redis side with a single process which should be plenty for your described scenario. This key structure allows me to grab data based on windows of time. I also added an expire to the keys to avoid writing a db cleanup process. I then had a cronjob that would do set operations to "roll-up" stats in to hourly, daily, and weekly using variants of the aforementioned key pattern. I bring these ideas up as they are ways you can take advantage of the built in capabilities of Redis to make the reporting side simpler. There are other ways of doing it but this pattern seems to work well.
正如eyossi指出的那样,对于同时执行写入和读取的系统,全局锁定可能是一个真正的问题.如果您将其编写为实时系统,则并发性很可能是一个问题.如果它是一个如果一天结束"的日志解析系统,那么除非您在输入时运行解析器的多个实例或报表,否则它不太可能触发争用.关于保持快速读取,在Redis中,我将考虑建立一个从主实例派生的只读Redis实例.如果将其放在运行报告的服务器上,并指向报告过程,则生成报告应该很快.
As noted by eyossi the global lock can be a real problem with systems that do concurrent writes and reads. If you are writing this as a real time system the concurrency may well be an issue. If it is an "end if day" log parsing system then it would not likely trigger the contention unless you run multiple instances of the parser or reports at the time of input. With regards to keeping reads fast In Redis, I would consider setting up a read only redis instance slaved off of the main one. If you put it on the server running the report and point the reporting process at it it should be very quick to generate the reports.
根据可用内存,数据集大小以及是否在Redis实例中存储任何其他类型的数据,您可能会考虑运行32位Redis服务器以降低内存使用率. 32b实例应该能够在一小部分内存中保留大量此类数据,但是如果运行正常的64位Redis不会占用太多内存,则可以随意使用它.与往常一样,测试您自己的使用模式以进行验证
Depending on your available memory, data set size, and whether you store any other type of data in the redis instance you might consider running a 32bit redis server to keep the memory usage down. A 32b instance should be able to keep a lot of this type of data in a small chunk of memory, but if running the normal 64 bit Redis isn't taking too much memory feel free to use it. As always test your own usage patterns to validate
这篇关于一次增加数百个计数器(redis或mongodb)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!