

我们使用具有8核32GB RAM的mongodb 3.4.14.我正在用Jmeter执行负载测试,有70个线程,我有可接受的输出.但是,随着负载的增加,SLA呈指数增长,吞吐量急剧下降.我尝试增加ulimit,并且下一步是分片,除此之外,还有其他我可以做的性能优化吗?

We use mongodb 3.4.14 with 8 core, 32GB RAM. I was performing the load test with Jmeter, with 70 threads I have acceptable output. But as the load increases SLA is exponentially increasing and throughput reduces drastically. I tried increasing the ulimit and sharding is the next step, apart from that is there any other performance optimization that I can do ?


@Jeet, here are the findings :

  1. 是否有很多聚合查询?您拥有哪种收集结构,即


The load test is run on a single aggregation query and the structure of the document is also having same set of fields. Fixing the document size would help ? how can I do it?

  1. 是否有很多嵌套数组?


  1. 是单个实例还是副本集?尝试将副本集放入具有不同节点的读写权限.


Currently we want to run only on single node.

  1. 查询是否从多个集合中返回数据?


  1. 检查您的实例是页面错误的操作百分比是多少?


With a load of 500 users I don't see much page faults, only 2 digit numbers.

  1. 在高锁定/队列期间检查日志中是否具有高nscanned或scanAndOrder的操作,并相应地建立索引.


  1. 检查查询中是否有CPU密集型运算符,例如$ all,$ push/$ pop/$ addToSet,以及对大型文档的更新,尤其是对具有大型数组(或大型子文档数组)的文档的更新.


Yes, with the above load CPU is full and responses are delayed. We are doing a groupBy and then sorting with limit.

  1. 如果您的数据库是大量写入操作,请记住,每个数据库一次只能写入一个CPU(由于该线程持有写入锁定).考虑将部分数据移至其自己的数据库中.


Our database is mostly read heavy, the collection will be populated once a day.


Apart from this I tried to do a simple test by putting the below code in a for loop :

Document findQuery = new Document("userId", "Sham");
FindIterable<Document> find = collection.find(findQuery);
MongoCursor<Document> iterator = find.iterator();


Used executor to start the process:

ExecutorService executorService = Executors.newFixedThreadPool(100);


even with this the performance is slow its taking like 900ms to return.


1 request = 150ms per request


100 request = 900ms per request


when I see the stats its as below for 500 users:

insert query update delete getmore command dirty used flushes vsize   res qrw arw net_in net_out conn                time
    *0    *0     *0     *0       0     1|0  0.0% 0.0%       0  317M 28.0M 0|0 0|0   156b   45.1k    3 Oct 12 15:31:19.644
    *0    *0     *0     *0       0     1|0  0.0% 0.0%       0  317M 28.0M 0|0 0|0   156b   45.1k    3 Oct 12 15:31:20.650
    *0    *0     *0     *0       0     3|0  0.0% 0.0%       0  317M 28.0M 0|0 0|0   218b   46.1k    3 Oct 12 15:31:21.638
    *0    *0     *0     *0       0     2|0  0.0% 0.0%       0  317M 28.0M 0|0 0|0   158b   45.4k    3 Oct 12 15:31:22.638
    *0    *0     *0     *0       0     1|0  0.0% 0.0%       0  317M 28.0M 0|0 0|0   157b   45.4k    3 Oct 12 15:31:23.638
    *0   376     *0     *0       0   112|0  0.0% 0.0%       0  340M 30.0M 0|0 0|0  64.9k   23.6m   26 Oct 12 15:31:24.724
    *0    98     *0     *0       0   531|0  0.0% 0.0%       0  317M 27.0M 0|0 0|0   109k   6.38m    3 Oct 12 15:31:25.646
    *0    *0     *0     *0       0     2|0  0.0% 0.0%       0  317M 27.0M 0|0 0|0   215b   45.6k    3 Oct 12 15:31:26.646
    *0    *0     *0     *0       0     1|0  0.0% 0.0%       0  317M 27.0M 0|0 0|0   157b   45.1k    3 Oct 12 15:31:27.651
    *0    *0     *0     *0       0     2|0  0.0% 0.0%       0  317M 27.0M 0|0 0|0   159b   45.8k    3 Oct 12 15:31:28.642



This also depends on the kind of queries you are firing, Please check if below mentioned points are there -

  • 是否有很多聚合查询?什么样的收藏你有没有结构
  • 是否有很多嵌套数组?
  • 是不是单一实例或副本集?尝试将副本集与read一起放置并写入不同的节点.
  • 查询是否从中返回数据多个收藏?
  • 检查您的实例是页面错误的操作百分比是多少?
  • 在高锁定/队列期间检查日志中是否具有高nscanned或scanAndOrder的操作,并相应地建立索引.
  • 检查查询中是否有CPU密集型运算符,例如$ all,$ push/$ pop/$ addToSet,以及对大型文档的更新,尤其是对具有大型数组(或大型子文档数组)的文档的更新.
  • 如果您的数据库是大量写入操作,请记住,每个数据库一次只能写入一个CPU(由于该线程持有写入锁定).考虑将部分数据移到其自己的数据库中.
  • is there lot of aggregation queries? What kind of collectionstructure do you have i.e
  • is there a lot of nested arrays?
  • Is it asingle instance or replica set? Try putting a replica set with readand write to different node.
  • Are the queries returning data frommultiple collections?
  • Check your instance is page-faulting for how much % of operations?
  • Check your logs for operations with high nscanned or scanAndOrder during periods of high lock/queue, and index accordingly.
  • Check your queries for CPU-intensive operators like $all, $push/$pop/$addToSet, as well as updates to large documents, and especially updates to documents with large arrays (or large subdocument arrays).
  • if your database is write-heavy, keep in mind that only one CPU per database can write at a time (owing to that thread holding the write lock). Consider moving part of that data into its own database.


This are few things which decreases performance over the time. I have covered most common use cases here however please check this post for some more insights.


08-20 14:47