本文介绍了MongoDB聚合查询性能改进的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始将数据从Microsoft SQL Server转移到MongoDB,以获得可伸缩性.在迁移方面一切都很好.

I recently started shifting data from Microsoft SQL Server to MongoDB to obtain scalability. All good in term of migration.

该文档有2个重要字段:客户,时间戳(年月日).

The document has 2 important fields: customer, timestamphash (year month day).

我们在安装MongoDB的Azure Linux中仅导入了7500万数据.在两个字段上都添加了复合索引之后,我们遇到了以下问题:

We imported only 75 Million data in Azure Linux where we install MongoDB.After adding compound index on both fields, we are having the following problem:

在3百万个数据上(过滤后),需要24秒才能完成按customerId计数的汇总组.对于相同的数据,SQL Server在不到1秒的时间内给出了结果.

On 3 Milion data (after filterin) it takes 24 seconds to finish an aggregate group by count by customerId. The SQL Server gives the result in less then 1 second on the same data.

您认为Casandra将是更好的解决方案吗?我们需要对大量数据的查询性能.

Do you think Casandra will be a better solution? We need query performance on big number of data.

我尝试了磁盘写入,从而为VM提供了更多RAM.什么都行不通.

I tried disk write, giving the VM more RAM. Nothing works.

查询:

aaggregate([
{ "$match" : { "Customer" : 2 } },
{ "$match" : { "TimestampHash" : { "$gte" : 20160710 } } },
{ "$match" : { "TimestampHash" : { "$lte" : 20190909 } } },
{ "$group" : { "_id" : { "Device" : "$Device" }, "__agg0" : { "$sum" : 1 } } },
{ "$project" : { "Device" : "$_id.Device", "Count" : "$__agg0", "_id" : 0 } },
{ "$skip" : 0 },
{ "$limit" : 10 }])

更新:我使用了"allowDiskUse:true",问题已解决.过滤3M数据减少到4秒.

Update:I used 'allowDiskUse: true' and the problem was solved. Reduced to 4 seconds for 3M data filtered.

推荐答案

我遇到了,在此问题之前,期间,类似的问题,老实说,我想Cassandra在您的特定情况下会更好,但是问题是关于Mongo聚合查询优化的,对吧?

I have encounter a similar problem before, during this question, and to be honest, I guess Cassandra is better in your certain case, but the question was about Mongo aggregation query optimization, right?

就目前而言,我的一个馆藏中有超过3M +的文档,并且如果您正确地建立索引,则汇总查询的时间不应超过24秒.

As for now, one of my collections have more then 3M+ docs, and shouldn't take 24s for aggregation queries if you build indexes correctly.

  1. 首先,通过Mongo Compass检查索引使用情况. Mongo确实在使用它吗?如果您的应用程序spam查询数据库并且index的使用率为0(如下例所示),则比您已经猜到的索引错误.
  2. 第二件事是使用explain方法(该文档将为您提供帮助),以查看有关您的query的更多信息.

  1. First of all, check out the index usage via Mongo Compass. Does Mongo actually using it? If your app spam queries to DB and your index have 0 usage (as in example below) than, as you already guessed it, something wrong with your index.
  2. The Second thing is, using explain method (this doc will help you out), to check out more info about your query.

第三点:索引字段排序很重要.例如,如果您的$match阶段包含3个字段,并且您按字段要求提供文档:

And for the third: index field sorting matters. For example if you have $match stage with 3 fields and you request docs by fields:

{ $match: {a_field:a, b_field:b, c_field:c} }

然后您应该以完全相同的顺序在a,b,c字段上建立compound索引.

then you should build compound index on a,b,c fields in the exact same order.

总是存在某种数据库体系结构问题.我强烈建议您不要stockpile一个集合内的所有数据.在插入时使用{timestamps:true}(它创建了两个字段,例如createdAt:updatedAt:

There is always some kind of DB architecture problem. I highly recommend you not to stockpile all data inside one collection. Using {timestamps:true} on insertion (it created two fields, like createdAt: and updatedAt:

        {
            timestamps: true
        }

在您的架构中,将旧/过时的数据存储在不同的集合中,并使用$lookup 聚合方法,当您确实需要使用它们时.

in your schema, store old-time/outdated data in different collection and use $lookup aggregation method for them, when you really needs to operate with them them.

希望您会在我的回答中找到有用的东西.

Hope you'll find something useful in my answer.

这篇关于MongoDB聚合查询性能改进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 00:29