问题描述
我们需要按照查询参数的顺序创建一个复合索引.这个顺序对性能有影响吗?
想象一下,我们有一个地球上所有人类的集合,其中有一个关于 sex
的索引(99.9% 的时间是男性"或女性",但仍然是字符串(不是二进制))和一个索引在 name
上.
如果我们希望能够选择具有特定name
的特定sex
的所有人,例如所有名为John"的男性",是先有sex
还是name
的复合索引更好?为什么(不是)?
Redsandro,
您必须考虑
当您创建复合索引时,1 个索引 将包含多个字段.因此,如果我们通过 {"sex" : 1, "name" : 1}
索引一个集合,索引将大致如下:
["male","Rick"] ->0x0c965148[男性",约翰"] ->0x0c965149[男性",肖恩"] ->0x0cdf7859[男性",兄弟"] ->>0x0cdf7859...[女性",凯特"] ->0x0c965134[女性",凯蒂"] ->0x0c965126[女性",纳吉"] ->0x0c965183[女性",琼"] ->0x0c965191[女性",萨拉"] ->0x0c965103
如果我们通过 {"name" : 1, "sex" : 1}
索引一个集合,索引将大致如下:
["John","male"] ->0x0c965148[约翰",女性"] ->0x0c965149[约翰",男性"] ->0x0cdf7859[瑞克",男性"] ->0x0cdf7859...[凯特",女性"] ->0x0c965134[凯蒂",女性"] ->0x0c965126[纳吉",女性"] ->0x0c965183[琼",女性"] ->0x0c965191[萨拉",女性"] ->0x0c965103
使用 {name:1}
作为 Prefix 将在使用复合索引时为您提供更好的服务.关于这个主题还有很多可以阅读的内容,我希望这可以提供一些清晰度.
We need to create a compound index in the same order as the parameters are being queried. Does this order matter performance-wise at all?
Imagine we have a collection of all humans on earth with an index on sex
(99.9% of the time "male" or "female", but string nontheless (not binary)) and an index on name
.
If we would want to be able to select all people of a certain sex
with a certain name
, e.g. all "male"s named "John", is it better to have a compound index with sex
first or name
first? Why (not)?
Redsandro,
You must consider Index Cardinality
and Selectivity
.
1. Index Cardinality
The index cardinality refers to how many possible values there are for a field. The field sex
only has two possible values. It has a very low cardinality. Other fields such as names, usernames, phone numbers, emails
, etc. will have a more unique value for every document in the collection, which is considered high cardinality.
Greater Cardinality
The greater the cardinality of a field the more helpful an index will be, because indexes narrow the search space, making it a much smaller set.
If you have an index on
sex
and you are looking for men named John. You would only narrow down the result space by approximately %50 if you indexed bysex
first. Conversely if you indexed byname
, you would immediately narrow down the result set to a minute fraction of users named John, then you would refer to those documents to check the gender.Rule of Thumb
Try to create indexes on
high-cardinality
keys or puthigh-cardinality
keys first in the compound index. You can read more about it in the section on compound indexes in the book:
2. Selectivity
Also, you want to use indexes selectively and write queries that limit the number of possible documents with the indexed field. To keep it simple, consider the following collection. If your index is {name:1}
, If you run the query { name: "John", sex: "male"}
. You will have to scan 1
document. Because you allowed MongoDB to be selective.
{_id:ObjectId(),name:"John",sex:"male"}
{_id:ObjectId(),name:"Rich",sex:"male"}
{_id:ObjectId(),name:"Mose",sex:"male"}
{_id:ObjectId(),name:"Sami",sex:"male"}
{_id:ObjectId(),name:"Cari",sex:"female"}
{_id:ObjectId(),name:"Mary",sex:"female"}
Consider the following collection. If your index is {sex:1}
, If you run the query {sex: "male", name: "John"}
. You will have to scan 4
documents.
{_id:ObjectId(),name:"John",sex:"male"}
{_id:ObjectId(),name:"Rich",sex:"male"}
{_id:ObjectId(),name:"Mose",sex:"male"}
{_id:ObjectId(),name:"Sami",sex:"male"}
{_id:ObjectId(),name:"Cari",sex:"female"}
{_id:ObjectId(),name:"Mary",sex:"female"}
Imagine the possible differences on a larger data set.
A little explanation of Compound Indexes
It's easy to make the wrong assumption about Compound Indexes. According to MongoDB docs on Compound Indexes.
When you create a compound index, 1 Index will hold multiple fields. So if we index a collection by {"sex" : 1, "name" : 1}
, the index will look roughly like:
["male","Rick"] -> 0x0c965148
["male","John"] -> 0x0c965149
["male","Sean"] -> 0x0cdf7859
["male","Bro"] ->> 0x0cdf7859
...
["female","Kate"] -> 0x0c965134
["female","Katy"] -> 0x0c965126
["female","Naji"] -> 0x0c965183
["female","Joan"] -> 0x0c965191
["female","Sara"] -> 0x0c965103
If we index a collection by {"name" : 1, "sex" : 1}
, the index will look roughly like:
["John","male"] -> 0x0c965148
["John","female"] -> 0x0c965149
["John","male"] -> 0x0cdf7859
["Rick","male"] -> 0x0cdf7859
...
["Kate","female"] -> 0x0c965134
["Katy","female"] -> 0x0c965126
["Naji","female"] -> 0x0c965183
["Joan","female"] -> 0x0c965191
["Sara","female"] -> 0x0c965103
Having {name:1}
as the Prefix will serve you much better in using compound indexes. There is much more that can be read on the topic, I hope this can offer some clarity.
这篇关于复合索引的顺序对 MongoDB 的性能有何影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!