MongoDB分页查询优化

最近遇到mongo集群性能问题,主要体现在查询性能或者聚合性能慢(查询类似关系型数据库中select * from xx where a=’xx’,另外聚合类似group by+count、sum),nosql与关系型数据库存在很多类似，比如分页查询语句是比较常见问题,分页优化在数据库优化原理类似。常见分页场景需求(本次主要基于这2种场景进行优化介绍)：

1、取top N这种小结果集，想办法利用索引有序特性尽快返回结果集；

db.collection.find({query}).sort({name:1}).limit(50)

2、分页翻页，尤其是结果集特别多越往后翻页越慢。

db.collection.find({query}).sort({name:1}).skip(N).limit(50)

这里N越大，性能会越低。

【上篇】

分页top N案例以及优化思路

1、具体SQL逻辑：根据网点查询当天的签收明细并返回第一页2000条，所有sql都是查询当天签收，当天从00:00:00-23:59:59，查询时间越接近23:59:59，满足结果集的数据越多，直到数据没有变化.后面还有翻页的功能，暂时先不讨论.其中sort是根据单号来，所有单号都唯一的.signStatus只有0,1。

db.test.find({org:"10000",signT:{$gte:new Date(1590940800000), $lte: new Date(1591027199999) }, signStatus: { $in: [ 0, 1 ] } }).sort({no:1}).limit(2000);

2、慢日志中分析不同不同索引对应效率

通过mtools分析慢日志，平均执行时间300ms。

分析一个慢日志情况：

排序顺序与索引顺序一致则无需排序,执行时间是1084ms:索引【org_1_no_1_signT_1】

排序顺序与索引顺序不一致则需排序,执行时间是156ms: 索引【org_1_signT_1】

【org_1_no_1_signT_1】索引执行效率：

备注:返回6000，因为存在3个分片，需要mongos进一步过滤

     "executionStats": {
  "nReturned": 6000,
  "executionTimeMillis": 1084,
  "totalKeysExamined": 168130,
 "totalDocsExamined": 6000

【org_1_signT_1】索引执行效率

     "executionStats": {
  "nReturned": 6000,
 "executionTimeMillis": 156,
 "totalKeysExamined": 43744,
"totalDocsExamined": 43744

总结

1、排序与回表效率问题–针对当前小结果集下，ER索引效率要明显高于ESR索引效率。

第一个索引满足ESR理论,通过索引没有返回多余的行数，每个节点2000行,但是从16万索引key中过滤满足条件6000，解决排序问题，无排序回表少，索引是检索效率低且执行时间长；

第二个索引不满足ESR理论，只能满足ER理论，索引key与回表结果集一致，回表过滤到37744条。有排序回表多，索引效率高，执行时间短。

2、如果结果集呈现N倍数据级增长，比如百万级别，那么ER索引效率肯定低于ESR索引效率,虽然说ESR理论下最佳，但本次SQL写法ESR效率不高。

3、了解业务需求以及设计原因

db.test.find({org:"10000",signT:{$gte:new Date(1590940800000), $lte: new Date(1591027199999) }, signStatus: { $in: [ 0, 1 ] } }).sort({no:1}).limit(2000);

signT时间基本上都是一个时间点，存在少量不一样时间，所以说排序字段不能signT.所以采用no单号，后续沟通集合中存在一个staDate字段，同一天日期完全一致，后续将代码中signT使用staDate来替代，并修改索引为ES索引，完美解决排序与回表问题。

创建索引:MongoDB 4.2版本开始，background:1可以不用加，类似oracle或者mysql online ddl,只是在创建索引与结束加锁。4.2版本之前，后台创建索引比较慢,前台创建是db级别排他锁，导致整个db无法访问，谨慎操作。

db.test.createIndex({org:1,staDate:1,no:1},{background:1})

4、最终修业务SQL如下

db.test.find({org:"10000",staDate: new Date(1591027199999) }, signStatus: { $in: [ 0, 1 ] } }).sort({no:1}).limit(2000);

【org_1_no:1_staDate_1】索引执行效率

"executionStats" : {"executionSuccess" : true,"nReturned" : 6000,"executionTimeMillis" : 10,"totalKeysExamined" : 6000,"totalDocsExamined" : 6000

分页top N案例优化总结

1、性能提升。通过修改业务SQL逻辑，top 2000执行基本几十毫秒，相比之前最低都要100ms，最大要几秒，性能提升几倍到几十倍，如果数据量提升几个数理级别，提升至少100倍。

2、不管ESR最佳实践还是ES或者SR等相关索引规则，都是结合实际SQL以及结果集大小来具体问题具体分析，本案例中没有修改业务代码前ER比ESR效果好，即使ER有排序，这些都是建立结果集小的情况下，如果结果集很大，不管ESR还是ER都存在缺点，集合或者索引变成热点问题。

【下篇】

上篇我们分析分页TOP N如何进行创建索引以及不同索引对性能影响，随着数据量N级增长，不修改SQL业务逻辑，会存在不同集合或索引热点问题，经过修改业务逻辑，不管数据量如何增长，TOP N查询性能基本上保持在几十毫秒水平，使得在高并发下满足业务SLA要求。

这篇我们接着讲翻页性能优化。skip针对大结果下，通过改写可以获取相对稳定执行时间与效率，否则使用skip性能随着翻页越大，呈现性能瓶颈。

分页翻页案例以及执行效率

1、分页翻页，尤其是结果集特别多越往后翻页越慢，常规写法

db.collection.find({query}).sort({name:1}).skip(N).limit(50)

2、ESR索引下sql分页以及执行效率【ESR方式】

db.test.find({org:"10000", signT:{$gte:new Date(1590940800000), $lte: new Date(1591027199999)},signStatus:{$in:[0,1]} }).sort({no:1}).skip(50).limit(50).explain("executionStats")

ESR索引翻页执行效率

翻第二页(每页50条)

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 29,
"totalKeysExamined" : 876,
"totalDocsExamined" : 100,

翻第10页(每页50条)

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 1001,
"totalKeysExamined" : 10809,
"totalDocsExamined" : 500,

翻第100页(每页50条)

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 12830,
"totalKeysExamined" : 108725,
"totalDocsExamined" : 5000

索引:org:1,no:1,signT:1,翻页从第一页到100页，执行时间从29ms到12830ms。其实100页数据才5000条,但是totalKeysExamined检查是108725,此时返回5000条，相当于indexkey:doc=20:1,显然是低效索引的。正常是indexkey:returndoc=1:1是完美的索引。索引不能说ESR索引效率高，最好是ES效率(需要修改业务逻辑)。

3、ER索引后sql执行效率【ER】

ER索引翻页执行效率

翻第二页：

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 1037,
"totalKeysExamined" : 26973,
"totalDocsExamined" : 26973,

翻第10页：

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 85,
"totalKeysExamined" : 26973,
"totalDocsExamined" : 26973,

翻第100页：

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 106,
"totalKeysExamined" : 26973,
"totalDocsExamined" : 26973,

翻第500页：

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 157,
"totalKeysExamined" : 26973,
"totalDocsExamined" : 26973,
}).sort({no:1}).skip(50).limit(50).explain("executionStats")

索引:org:1,signT,翻页从第一页到100页，执行时间从1037ms到157ms.第一次慢的主要返回索引记录，因为索引顺序与排序顺序不一致，导致不管翻多少页，每次检查记录数与索引数是一致，但是返回只有50条，此时返回记录始终都是26973,排序后返回50条，,显然是索引效率与回表返回记录来说，效率也不高的。正常是indexkey:returndoc=1:1是完美的索引.如果用户想要快速获取第一页的记录或者前面几页，ESR效率要高于ER，最好是ES效率(需要修改业务逻辑)。

4、修改业务代码后高效索引下sql分页以及执行效率【ES方式】

db.test.find({org:"10000", staDate: ISODate("2020-07-17T00:00:00.000+08:00"),signStatus:{$in:[ 0, 1 ] } }).sort({no:1}).skip(50).limit(50).explain("executionStats")

ES索引翻页效率最高

翻第二页(每页50条)

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 20,
"totalKeysExamined" : 100,
"totalDocsExamined" : 100

翻第10页(每页50条)

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 82,
"totalKeysExamined" : 500,
"totalDocsExamined" : 500,

翻第100页(每页50条)

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 716,
"totalKeysExamined" : 5000,
"totalDocsExamined" : 5000

翻500页(每页50条)

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 1104,
"totalKeysExamined" : 25000,
"totalDocsExamined" : 25000

高效索引:org:1,no:1,staDate:1,翻页从第一页到500页，执行时间从20ms到1104ms.其实500页数据才25000条,如果单页显示100条，翻500页，执行时间更长,此时还是在高效索引以及代码修改后效果。此时ES索引。

4、以上除了ES索引下从第一页到500页，ESR执行时间最大是1s，ESR翻100页，执行时间已经接近13s，ER或索引，显然翻页N越大，呈现性能越差。小翻页下性能尚且能接受，大翻页下性能肯定是需要优化，如果代码不能ES索引，那么ESR或ER效率都不高的情况，需要分析结果集大小，如果结果集小，ESR效率相对好些，如果结果集非常大，ESR或者ER索引都存在瓶颈。

ESR:瓶颈索引是热点，索引需要过滤R条件部分,大部分记录不满足条件，性能在索引过滤上。

ER:瓶颈返回索引记录与回表记录是1:1,但需要结果集大部分丢弃.最终索引与集合都在瓶颈，尤其索引与集合太大，需要从磁盘加载。

有没有什么写法能够实现翻页呈现稳定性能，有的，但是有一定条件限制。

分页翻页案例改写以及性能

1、分页翻页，尤其是结果集特别多越往后翻页越慢，常规写法

db.collection.find({query}).sort({id:1}).skip(N).limit(50)

如果id是唯一或者想办法使用唯一列来排序，此时可以将翻页语句修改如下：

db.test.find({org:"10000",staDate:ISODate("2020-07-17T00:00:00.000+08:00"),
signStatus:{$in:[0,1]},no{$gt:N}，}).sort({no:1}).limit(50);

注意：no:{$gt:N}

这里N是上页最后一个值,取下一页之前需要获取上一页最大值且无需skip关键字。

2、代码逻辑(shell演示)

var firstpage=db.test.find({org:"10000", staDate: ISODate("2020-07-17T00:00:00.000+08:00"), signStatus:{$in:[ 0, 1 ] } }).sort({no:1}).limit(50)

var latest=null;
//获取第一页最后一条记录
while (firstpage.hasNext()){
 latest= firstpage.next();
print(tojson(latest))};

   第50条记录: no:XX2653808
//获取下一页
  db.test.find({org:"10000", staDate: ISODate("2020-07-17T00:00:00.000+08:00"), signStatus:{$in:[ 0, 1 ] } ,no:{$gt:latest.no}}).sort({no:1}).limit(50)

注意：no:{$gt:latest.no}

3、验证执行效率

翻第2页:

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 13,
"totalKeysExamined" : 50,
"totalDocsExamined" : 50

翻第10页:

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 14,
"totalKeysExamined" : 50,
"totalDocsExamined" : 50,

翻第100页：

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 16,
"totalKeysExamined" : 50,
"totalDocsExamined" : 50,

对应执行计划:

"executionStats" : {
"executionSuccess" : true,
"nReturned" : 50,
"executionTimeMillis" : 16,
"totalKeysExamined" : 50,
"totalDocsExamined" : 50,
"executionStages" : {
"stage" : "LIMIT",
"nReturned" : 50,"
executionTimeMillisEstimate" : 0,
"works" : 51,
"advanced" : 50,
"needTime" : 0,
"needYield" : 0,
"saveState" : 2,
"restoreState" : 2,
"isEOF" : 1,
"limitAmount" : 50,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"signStatus" : {
"$in" : [
0,
1
]
}
},
"nReturned" : 50,
"executionTimeMillisEstimate" : 0,
"works" : 50,
"advanced" : 50,
"needTime" : 0,
"needYield" : 0,
"saveState" : 2,
"restoreState" : 2,
"isEOF" : 0,
"docsExamined" : 50,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 50,
"executionTimeMillisEstimate" : 0,
"works" : 50,
"advanced" : 50,
"needTime" : 0,
"needYield" : 0,
"saveState" : 2,
"restoreState" : 2,
"isEOF" : 0,
"keyPattern" : {
"org" : 1,
"staDate" : 1,
"no" : 1},
"indexName" : "org_1_staDate_1_no_1"

4、取消skip方式,对排序列增加一个大于上一页最大值来快速获取分页，性能基本上在10-20ms之间。

分页与翻页总结

1、分页以及翻页需要配合最佳索引才能获取最佳SLA性能，否则分页与翻页随着结果集增长，性能会呈现瓶颈。

2、可以对skip进行改写来获取稳定SLA性能，针对排序列使用大于上一页的最大值来实现，此时排序列需要唯一特性才可以，否则会出现重复数据情况。

3、不管ESR、ER还是ES，都需要具体问题具体分析，例如索引key扫描、回表扫描记录数、返回记录数都分析他们之间比例，1:1:1是最好性能，随着数据增长，瓶颈在索引、还是在集合中。

4、并不是所有分页、翻页SQL都可以优化到最佳性能，主要取决于SQL写法以及对索引原理了解（能否创建出高效索引），最大问题在于SQL写法。

作者：徐靖

物流快递行业数据库运维与技术研究,主要是Oracle,MongoDB,Mysql技术。

xiaomangguo

相关推荐

评论 1

评论前必须登录！

强力推荐

中文社区公众号

MongoDB 同步/迁移工具

友情链接

回顶部

关注微信：mongoing-mongoing
复制微信号

提交工单咨询