问题描述
背景
我正在对从 RDBMS 数据库到 MongoDB 的转换进行原型设计.在非规范化时,我似乎有两种选择,一种会导致许多(数百万)个较小的文档,另一种会导致较少(数十万)个大型文档.
如果我能把它提炼成一个简单的类比,那就是像这样的客户文档较少的集合之间的区别(在 Java 中):
类客户{私人字符串名称;私人地址地址;//每张信用卡有数百个支付实例私人套餐<信用卡>信用卡;}或者像这样有很多很多支付文档的集合:
类付款{私人客户客户;私人信用卡信用卡;私人日期 payDate;私人浮动payAmount;}问题
MongoDB 的设计是偏爱很多很多的小文档还是更少的大文档?答案是否主要取决于我计划运行的查询?(即客户 X 有多少张信用卡?与所有客户上个月支付的平均金额是多少?)
我环顾四周,但没有偶然发现任何可以帮助我回答问题的 MongoDB 架构最佳实践.
您肯定需要针对您正在执行的查询进行优化.
这是我根据您的描述做出的最佳猜测.
您可能想知道每个客户的所有信用卡,因此在客户对象中保留一个数组.您可能还希望为每笔付款提供一个客户参考.这将使 Payment 文档相对较小.
Payment 对象将自动拥有自己的 ID 和索引.您可能还想在 Customer 参考中添加索引.
这将允许您快速搜索 Payments by Customer,而无需每次都存储整个客户对象.
如果您想回答诸如上个月所有客户支付的平均金额是多少"之类的问题,您将需要一个适用于任何大型数据集的 map/reduce.您不会实时"收到此响应.您会发现存储对 Customer 的引用"对于这些 map-reduces 可能已经足够了.
所以直接回答你的问题:MongoDB 的设计是喜欢很多很多小文档还是更少的大文档?
MongoDB 旨在非常快速地查找索引条目.MongoDB 非常擅长于大海捞针.MongoDB 不是很擅长在大海捞针中寻找大多数针.因此,请围绕最常见的用例构建数据,并为较少见的用例编写 map/reduce 作业.
Background
I'm prototyping a conversion from our RDBMS database to MongoDB. While denormalizing, it seems as if I have two choices, one which leads to many (millions) of smaller documents or one which leads to fewer (hundreds of thousands) large documents.
If I could distill it down to a simple analog, it would be the difference between a collection with fewer Customer documents like this (in Java):
class Customer { private String name; private Address address; // each CreditCard has hundreds of Payment instances private Set<CreditCard> creditCards; }
or a collection with many, many Payment documents like this:
class Payment { private Customer customer; private CreditCard creditCard; private Date payDate; private float payAmount; }
Question
Is MongoDB designed to prefer many, many small documents or fewer large documents? Does the answer mostly depend on what queries I plan on running? (i.e. How many credit cards does customer X have? vs What was the average amount all customers paid last month?)
I've looked around a lot but I didn't stumble into any MongoDB schema best practices that would help me answer my question.
You'll definitely need to optimize for the queries you're doing.
Here's my best guess based on your description.
You'll probably want to know all Credit Cards for each Customer, so keep an array of those within the Customer Object. You'll also probably want to have a Customer reference for each Payment. This will keep the Payment document relatively small.
The Payment object will automatically have its own ID and index. You'll probably want to add an index on the Customer reference as well.
This will allow you to quickly search for Payments by Customer without storing the whole customer object every time.
If you want to answer questions like "What was the average amount all customers paid last month" you're instead going to want a map / reduce for any sizeable dataset. You're not getting this response "real-time". You'll find that storing a "reference" to Customer is probably good enough for these map-reduces.
So to answer your question directly: Is MongoDB designed to prefer many, many small documents or fewer large documents?
MongoDB is designed to find indexed entries very quickly. MongoDB is very good at finding a few needles in a large haystack. MongoDB is not very good at finding most of the needles in the haystack. So build your data around your most common use cases and write map/reduce jobs for the rarer use cases.
这篇关于MongoDB Schema Design - 许多小文档还是更少的大文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!