问题描述
MongoDB很快,但是仅当您的工作集或索引可以放入RAM时.因此,如果我的服务器具有16G的RAM,这是否意味着我所有集合的大小都必须小于或等于16G?怎么说好,这是我的工作集,其余的可以存档"?
MongoDB is fast, but only when your working set or index can fit into RAM. So if my server has 16G of RAM, does that mean the sizes of all my collections need to be less than or equal to 16G? How does one say "ok this is my working set, the rest can be "archived?"
推荐答案
工作集"基本上是系统将处于活动状态/正在使用的数据和索引的数量.
"Working set" is basically the amount of data AND indexes that will be active/in use by your system.
例如,假设您拥有1年的数据.为简单起见,每个月与1GB数据相关,总共为12GB,而为了覆盖每个月的数据,您又拥有1GB的索引,全年的总索引为12GB.
So for example, suppose you have 1 year's worth of data. For simplicity, each month relates to 1GB of data giving 12GB in total, and to cover each month's worth of data you have 1GB worth of indexes again totalling 12GB for the year.
如果您始终访问最近12个月的数据,则您的工作集为:12GB(数据)+ 12GB(索引)= 24GB.
If you are always accessing the last 12 month's worth of data, then your working set is: 12GB (data) + 12GB (indexes) = 24GB.
但是,如果您实际上只访问了最近3个月的数据,则您的工作集为:3GB(数据)+ 3GB(索引)= 6GB.在这种情况下,如果您有8GB RAM,然后开始定期访问过去6个月的数据,那么您的工作集将开始超过可用RAM,从而对性能产生影响.
However, if you actually only access the last 3 month's worth of data, then your working set is: 3GB (data) + 3GB (indexes) = 6GB. In this scenario, if you had 8GB RAM and then you started regularly accessing the past 6 month's worth of data, then your working set would start to exceed past your available RAM and have a performance impact.
但是通常,如果您有足够的RAM来覆盖您希望经常访问的数据/索引,那么您会没事的.
But generally, if you have enough RAM to cover the amount of data/indexes you expect to be frequently accessing then you will be fine.
对评论中的问题的回答
我不确定我是否会遵循,但是我会去回答.首先,工作集的计算是一个棒球场数字".其次,如果您在user_id上有一个(例如)1GB索引,则仅该索引中通常访问的部分需要在RAM中(例如,假设50%的用户处于非活动状态,则该索引的0.5GB会更频繁)需要/需要在RAM中).通常,您拥有的RAM越多,效果越好,尤其是随着使用量的增加,工作集可能随时间增长.这就是分片的地方-将数据拆分到多个节点上,您可以经济高效地进行横向扩展.然后,您的工作集将分配到多台计算机上,这意味着可以在RAM中保留更多数据.需要更多RAM?将另一台计算机添加到分片上.
Response to question in comments
I'm not sure I quite follow, but I'll have a go at answering. Firstly, the calculation for working set is a "ball park figure". Secondly, if you have a (e.g.) 1GB index on user_id, then only the portion of that index that is commonly accessed needs to be in RAM (e.g. suppose 50% of users are inactive, then 0.5GB of the index will be more frequently required/needed in RAM). In general, the more RAM you have, the better especially as working set is likely to grow over time due to increased usage. This is where sharding comes in - split the data over multiple nodes and you can cost effectively scale out. Your working set is then divided over multiple machines, meaning the more can be kept in RAM. Need more RAM? Add another machine to shard on to.
这篇关于适应“工作集"意味着什么?进入MongoDB的RAM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!