问题描述
我们正在开发一个非常大的项目,我想知道是否有人可以给我一些关于我们应该选择哪个数据库后端的建议。
我们的系统是复合的1100电子设备向中央服务器发送信号,然后服务器存储信号info(信号大约35字节长)。这些设备每分钟每分钟将发送3个信号,因此,如果我们做de数字,那么数据库上每天将有4.752.000个新记录,每月总计142.560.000个新记录。 p>
我们需要一个光照快速可靠的数据库后端。当然,我们需要在那个DB上做一些复杂的数据挖掘。我们正在对MongoDB / Cassandra / Redis / CouchDB进行一些研究,但文档网站仍处于早期阶段。
任何帮助?想法?
非常感谢!
空间规模(1000+个设备)会误导您计算和/或存储规模。每秒几十个35字节的插入对于任何主流DBMS来说都是一个微不足道的工作负载,甚至运行在低端硬件上。同样,每月142万条记录每月只有1〜10吉字节的存储量,没有任何压缩,包括索引。
在您的问题评论中,说:
可靠性任何主流DBMS都可以保证这一点(假设你的意思是不会损坏你的数据,它不会崩溃 - 见我的讨论CAP定理在这个答案的底部)速度?即使用一台机器,10〜100倍这个工作量不应该是一个问题。可扩展性?在当前速率,全年的数据,未压缩,甚至完全索引,将很容易适合100千兆字节的磁盘空间(同样,我们已经建立了插入率不是一个问题)。
因此,我没有看到任何明确的需要一个奇异的解决方案,如NoSQL,或者甚至一个分布式数据库 - 一个简单的,旧的关系数据库,如MySQL会很好。如果您担心故障转移,只需在主从配置中设置备份服务器。如果我们说的是当前刻度的100或1000倍,则只需根据数据收集设备的ID水平分割几个实例( {/ div} {partition index} = {device id} modulo {请记住,离开关系数据库世界的安全和舒适的界限意味着放弃它的表示模型 和其丰富的工具集。这将使你的复杂数据库变得更加困难 - 你不需要把数据放到数据库中,你也需要把数据导出。
全部的说法,MongoDB和CouchDB是非常简单的部署和使用。他们也很有趣,并且会让你对任何数量的人更有吸引力(不仅仅是程序员 - 高管。)
普遍的智慧是, ,你建议的三个NoSQL解决方案,Cassandra是最好的高插入量(当然,相对来说,我不认为你有高插入量 - 这是设计为使用 Facebook );这是因为更难以处理。因此,除非你有一些奇怪的要求,你没有提及,我会建议反对它,为您的用例。
如果你积极地设置在NoSQL部署,你可能想考虑CAP定理。这将帮助您决定MongoDB和CouchDB之间。以下是一个很好的链接:。这一切都归结于你的意思是可靠性: MongoDB交易的可用性一致性,而CouchDB交易一致性的可用性。 (Cassandra允许您通过指定必须为写入/读取成功写入/读取多少服务器来进行此查询; UPDATE:现在,CouchDB也可以使用非常令人兴奋...)
在您的项目中运气最好。 p>
We're developing a really big project and I was wondering if anyone can give me some advice about what DB backend should we pick.
Our system is compound by 1100 electronic devices that send a signal to a central server and then the server stores the signal info (the signal is about 35 bytes long). How ever these devices will be sending about 3 signals per minute each, so if we do de numbers, that'll be 4.752.000 new records/day on the database, and a total of 142.560.000 new records/month.
We need a DB Backend that is lighting fast and reliable. Of course we need to do some complex data mining on that DB. We're doing some research on the MongoDB/Cassandra/Redis/CouchDB, however the documentation websites are still on early stages.
Any help? Ideas?
Thanks a lot!
Don't let the spatial scale (1000+ devices) mislead you as to the computational and/or storage scale. A few dozen 35-byte inserts per second is a trivial workload for any mainstream DBMS, even running on low-end hardware. Likewise, 142 million records per month is only on the order of 1~10 gigabytes of storage per month, without any compression, including indices.
In your question comment, you said:
Reliability? Any mainstream DBMS can guarantee this (assuming you mean it's not going to corrupt your data, and it's not going to crash--see my discussion of the CAP theorem at the bottom of this answer). Speed? Even with a single machine, 10~100 times this workload should not be a problem. Scalability? At the current rate, a full year's data, uncompressed, even fully indexed, would easily fit within 100 gigabytes of disk space (likewise, we've already established the insert rate is not an issue).
As such, I don't see any clear need for an exotic solution like NoSQL, or even a distributed database--a plain, old relational database such as MySQL would be just fine. If you're worried about failover, just setup a backup server in a master-slave configuration. If we're talking 100s or 1000s of times the current scale, just horizontally partition a few instances based on the ID of the data-gathering device (i.e. {partition index} = {device id} modulo {number of partitions}).
Bear in mind that leaving the safe and comfy confines of the relational database world means abandoning both its representational model and its rich toolset. This will make your "complex datamining" much more difficult--you don't just need to put data into the database, you also need to get it out.
All of that being said, MongoDB and CouchDB are uncommonly simple to deploy and work with. They're also very fun, and will make you more attractive to any number of people (not just programmers--executives, too!).
The common wisdom is that, of the three NoSQL solutions you suggested, Cassandra is the best for high insert volume (of course, relatively speaking, I don't think you have high insert volume--this was designed to be used by Facebook); this is countered by being more difficult to work with. So unless you have some strange requirements you didn't mention, I would recommend against it, for your use case.
If you're positively set on a NoSQL deployment, you might want to consider the CAP theorem. This will help you decide between MongoDB and CouchDB. Here's a good link: http://blog.nahurst.com/visual-guide-to-nosql-systems. It all comes down to what you mean by "reliability": MongoDB trades availability for consistency, whereas CouchDB trades consistency for availability. (Cassandra allows you to finesse this tradeoff, per query, by specifying how many servers must be written/read for a write/read to succeed; UPDATE: Now, so can CouchDB, with BigCouch! Very exciting...)
Best of luck in your project.
这篇关于我应该选择什么:MongoDB / Cassandra / Redis / CouchDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!