MongoDB爱好者
垂直技术交流平台

mongo集群分片update不均匀案例一则

本文来自《2021MongoDB技术实践与应用案例征集活动》入围案例奖作品

作者:任坤

1. 背景  

线上mongo 4分片集群,版本percona 4.2,查看实时qps发现shard1的update很高,而剩余3个shard的update都很低。

–shard1

–shard2

要么是某个分片表的数据分布不均匀,要么就是没有开启分片。

 

2.诊断  

先核查一下大表。登录mongos,切换到该db,执行如下命令,每个表输出一行,分别为表名和size(MB) var collNames = db.getCollectionNames();

for (var i = 0; i < collNames.length; i++) { var coll = db.getCollection(collNames[i]);

var stats = coll.stats(1024 * 1024); print(stats.ns, stats.storageSize); }

找出最大的10个表。执行db.table.getShardDistribution(),发现每个大表都分布均匀。

查看shard1的mongod.log。没有发现任何慢查询,目前慢查询阈值为100ms。

查看shard1的oplog。use local db.oplog.rs.find({ “op”:”u” }).sort({$natural: -1}).limit(10) #查询最新的10条

update oplog { "ts" : Timestamp(1634090487, 1781), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505237968" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } } { "ts" : Timestamp(1634090487, 1780), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505237974" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } } { "ts" : Timestamp(1634090487, 1779), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505237975" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } } { "ts" : Timestamp(1634090487, 1778), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505237976" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } } { "ts" : Timestamp(1634090487, 1777), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2,"op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505238113" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } } { "ts" : Timestamp(1634090487, 1776), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505720693" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } } { "ts" : Timestamp(1634090487, 1775), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505726096" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } } { "ts" : Timestamp(1634090487, 1774), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505726320" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } } { "ts" : Timestamp(1634090487, 1773), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505750437" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-1310:01:27" } } } } { "ts" : Timestamp(1634090487, 1772), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505750438" }, "wall" : ISODate("2021-10-13T02:01:27.707Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } } { "ts" : Timestamp(1634090487, 1771), "t" : NumberLong(4), "h" : NumberLong(0), "v" : 2, "op" : "u", "ns" : "prod.prod_XXX", "ui" : UUID("22154461-6305-491d-8f1d-9a1630753508"), "o2" : { "_id" : "2021-10-10 18:00:00#505750439" }, "wall" : ISODate("2021-10-13T02:01:27.706Z"), "o" : { "$v" : 1, "$set" : { "__system" : { "pull_time" : "2021-10-13 10:01:27" } } } }

发现都是针对prod_XXX表的update,而该表没有分片。

和开发核对后,对其_id列创建hash索引并开启分片。

登录mongos,切换到该db,执行:

db.prod_XXX.ensureIndex({_id: "hashed"}, {background: true}) sh.shardCollection("prod.prod_XXX", { _id : "hashed" } )

update以肉眼可见的速度均衡,问题解决。

–shard1

–shard2

3.小结  

本次案例很简单也很常见,mongo分片如果tps不均衡,可以参照上述方法快速定位并解决。

用惯了mysql的人刚转手mongo会很不习惯,尤其是很多sql语法根本记不住,比如本文的查询集合大小以及查看oplog的命令,最好是记个笔记用到的时候直接翻出来看。

 

关于作者:

任坤,现居珠海,先后担任专职 Oracle 和 MySQL DBA,现在主要负责 MySQL、MongoDB、Redis和Clickhouse 维护工作

 

 

赞(2)
未经允许不得转载:MongoDB中文社区 » mongo集群分片update不均匀案例一则

评论 1

评论前必须登录!

 

  1. #1

    可以加上巡检,针对多分片集群超过多少G的 collection,如果没开分片,可以巡检发出来

    wuyanan8个月前 (02-17)