问题描述
使用猫鼬填充和直接对象之间是否存在任何性能差异(查询的处理时间)包含?什么时候应该使用?
Is there any performance difference (process time of query) between using Mongoose population and direct object inclusion ? When should each be used ?
猫鼬的人口示例:
var personSchema = Schema({
_id : Number,
name : String,
stories : [{ type: Schema.Types.ObjectId, ref: 'Story' }]
});
var storySchema = Schema({
_creator : { type: Number, ref: 'Person' },
title : String,
});
猫鼬对象嵌套示例:
var personSchema = Schema({
_id : Number,
name : String,
stories : [storySchema]
});
var storySchema = Schema({
_creator : personSchema,
title : String,
});
推荐答案
了解猫鼬种群的第一件事是它不是魔术,而是一种方便的方法,它使您无需亲自完成操作即可检索相关信息.
The first thing to understand about mongoose population is that it is not magic, but just a convenience method that allows you to retrieve related information without doing it all yourself.
该概念主要用于您决定需要将数据放置在单独的集合中而不是将数据嵌入的情况,并且主要考虑因素通常应该是文档大小或相关信息经常更新的地方这将使维护嵌入式数据变得笨拙.
The concept is essentially for use where you decide you are going to need to place data in a separate collection rather than embedding that data, and your main considerations should be typically on document size or where that related information is subject to frequent updates that would make maintaining embedded data unwieldy.
非魔术"部分实际上是在幕后发生的事情是,当您引用"另一个源时,填充函数会对相关"集合进行附加查询/查询,以合并"这些结果检索到的父对象.您可以自己执行此操作,但是此处提供的方法是为了简化任务.明显的性能"考虑因素是,没有一个往返数据库(MongoDB实例)的通道即可检索所有信息.总会有不止一个.
The "not magic" part is that essentially what happens under the covers is that when you "reference" another source, the populate function makes an additional query/queries to that "related" collection in order to "merge" those results of the parent object that you have retrieved. You could do this yourself, but the method is there for convenience to simplify the task. The obvious "performance" consideration is that there is not a single round trip to the database (MongoDB instance) in order to retrieve all the information. There is always more than one.
作为一个示例,请收集两个集合:
As a sample, take two collections:
{
"_id": ObjectId("5392fea00ff066b7d533a765"),
"customerName": "Bill",
"items": [
ObjectId("5392fee10ff066b7d533a766"),
ObjectId("5392fefe0ff066b7d533a767")
]
}
物品:
{ "_id": ObjectId("5392fee10ff066b7d533a766"), "prod": "ABC", "qty": 1 }
{ "_id": ObjectId("5392fefe0ff066b7d533a767"), "prod": "XYZ", "qty": 2 }
可以通过引用"模型或使用填充(在后台)完成的最佳"操作是:
The "best" that can be done by a "referenced" model or the use of populate (under the hood) is this:
var order = db.orders.findOne({ "_id": ObjectId("5392fea00ff066b7d533a765") });
order.items = db.items.find({ "_id": { "$in": order.items } ).toArray();
因此,显然有至少"两个查询和操作才能连接"该数据.
So there are clearly "at least" two queries and operations in order to "join" that data.
嵌入概念本质上是MongoDB对如何处理不支持"joins" 的答案.为了避免将数据拆分为规范化的集合,您尝试将相关"数据直接嵌入使用它的文档中.这里的优点是,存在一个用于检索相关"信息的单一读取"操作,以及一个用于更新父"和子"条目的写入"操作的单点,尽管通常无法写入一次许多"子级,而没有在客户端上处理列表"或以其他方式接受多个"写入操作,最好是在批处理"处理中.
The embedding concept is essentially the MongoDB answer to how to deal with not supporting "joins". So that rather that split data into normalized collections you try to embed the "related" data directly within the document that uses it. The advantages here are that there is a single "read" operation for retrieving the "related" information, and also a single point of "write" operations to both update "parent" and "child" entries, though often not possible to write to "many" children at once without processing "lists" on the client or otherwise accepting "multiple" write operations, and preferably in "batch" processing.
然后数据看起来像这样(与上面的示例相比):
Data then rather looks like this ( compared to the example above ):
{
"_id": ObjectId("5392fea00ff066b7d533a765"),
"customerName": "Bill",
"items": [
{ "_id": ObjectId("5392fee10ff066b7d533a766"), "prod": "ABC", "qty": 1 },
{ "_id": ObjectId("5392fefe0ff066b7d533a767"), "prod": "XYZ", "qty": 2 }
]
}
因此,实际上获取数据只是一个问题:
Therefore actually fetching the data is just a matter of:
db.orders.findOne({ "_id": ObjectId("5392fea00ff066b7d533a765") });
这两种方法的优缺点将在很大程度上取决于您的应用程序的使用模式.但一目了然:
The pros and cons of either will always largely depend on the usage pattern of your application. But at a glance:
-
带有嵌入式数据的文档总大小通常不会超过16MB(BSON限制),否则(作为准则)的数组将包含500个或更多的条目.
Total document size with embedded data will typically not exceed 16MB of storage (the BSON limit) or otherwise ( as a guideline ) have arrays that contain 500 or more entries.
嵌入的数据通常不需要频繁更改.因此,您可以忍受来自非规范化的重复",而无需在许多父文档中使用相同的信息来更新这些重复",只是为了进行更改.
Data that is embedded does generally not require frequent changes. So you could live with "duplication" that comes from the de-normalization not resulting in the need to update those "duplicates" with the same information across many parent documents just to invoke a change.
相关数据经常与父级关联使用.这意味着,如果您的读/写"用例几乎总是需要同时对父级和子级都进行读/写",那么为原子操作嵌入数据就很有意义.
Related data is frequently used in association with the parent. Which means that if your "read/write" cases are pretty much always needing to "read/write" to both parent and child then it makes sense to embed the data for atomic operations.
-
相关数据将始终超过16MB BSON限制.您始终可以考虑使用存储桶"的混合方法,但是不能违反主文档的一般硬性限制.常见的情况是发布"和评论",其中评论"活动预计会很大.
The related data is always going to exceed the 16MB BSON limit. You can always consider a hybrid approach of "bucketing", but the general hard limit of the main document cannot be breached. Common cases are "post" and "comments" where "comment" activity is expected to be very large.
相关数据需要定期更新.或实质上是您进行规范化"的情况,因为该数据已在许多父级之间共享"并且相关"数据被频繁更改,以至于无法在发生子级"项的每个父级"中更新嵌入项,这是不切实际的.更简单的情况是仅引用子项"并进行一次更改.
Related data needs regular updating. Or essentially the case where you "normalize" because that data is "shared" among many parents and the "related" data is changed frequently enough that it would be impractical to update embedded items in every "parent" where that "child" item occurs. The easier case is to just reference the "child" and make the change once.
读写之间有明显的区别.如果您在阅读父母"时可能不会总是要求相关"信息,或者在写给孩子时不一定要始终更改父母",则可能有充分的理由分离模型作为参考.另外,如果普遍希望一次更新许多子文档",而这些子文档"实际上是对另一个集合的引用,那么当数据位于单独的位置时,实现通常会更高效集合.
There is a clear separation of reads and writes. In the case where maybe you are not going to always require that "related" information when reading the "parent" or otherwise to not need to always alter the "parent" when writing to the child, there could be good reason to separate the model as referenced. Additionally if there is a general desire to update many "sub-documents" at once in which where those "sub-documents" are actually references to another collection, then quite often the implementation is more efficient to do when the data is in a separate collection.
因此,在数据建模,涵盖了各种使用案例以及由populate方法支持的使用嵌入或引用模型的方法.
So there actually is a much wider discussion of the "pros/cons" for either position on the MongoDB documentation on Data Modelling, which covers various use cases and ways to approach either using embedding or referenced model as is supported by the populate method.
希望点"可以使用,但是通常的建议是考虑应用程序的数据使用模式并选择最佳的数据使用模式.选择应该"来嵌入应该"是您选择MongoDB的原因,但实际上,这实际上是您的应用程序如何使用数据",从而决定哪种方法适合您数据建模的哪一部分(因为并非如此). 全有还是全无")最好.
Hopefully the "dot points" are of use, but the generally recommendation is to consider the data usage patterns of your application and choose what is best. Having the "option" to embed "should" be the reason you have chosen MongoDB, but it will actually be how your application "uses the data" that makes the decision to which method suits which part of your data modelling (as it is not "all or nothing") the best.
- 请注意,由于这是最初编写的语言,因此MongoDB引入了
$lookup
运算符,的确在服务器上的集合之间执行联接".出于此处一般性讨论的目的,在大多数情况下,更好"的情况是populate()
和多个查询"产生的多个查询"开销,通常还有大量开销" 发生任何$lookup
操作.
- Note that since this was originally written MongoDB introduced the
$lookup
operator which does indeed perform "joins" between collections on the server. For the purposes of the general discussion here, whist "better" in most circumstances that the "multiple query" overhead incurred bypopulate()
and "multiple queries" in general, there still is a "significant overhead" incurred with any$lookup
operation.
核心设计原则是嵌入"的意思是已经在那里",而不是从其他地方获取".本质上,放在口袋里"和在架子上"之间的区别以及在I/O术语上通常更像是""在市区图书馆的架子上" ,对于基于网络的请求尤其明显
The core design principle is "embedded" means "already there" as opposed to "fetching from somewhere else". Essentially the difference between "in your pocket" and "on the shelf", and in I/O terms usually more like "on the shelf in the library downtown", and notably further away for network based requests.
这篇关于猫鼬填充与对象嵌套的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!