翻译或纠错本页面

在集群中合并数据块

On this page

概述

mergeChunks 方法允许你将空的数据块合并到同分片相邻的数据块中.如果在设定的片键范围内没有数据,这个数据块就是空的.

重要

空的 数据块 会影响 balancer 对分片间数据均衡情况的正确判断.

空数据块在几种情况下会发生,包括:

  • 如果 预分裂 创建了过多数据块,数据在数据块间的分布可能不均衡:

  • 如果你删除了集群中很多数据,有些数据块可能不会包含数据.

这篇教程解释了怎样找到可以合并的数据块,怎样将数据块与相邻的数据块合并.

过程

注解

示例中使用 test 数据库的 users 集合,这个集合使用 username 作为片键.

确认数据块范围

在 program:mongo`终端中,使用以下操作确认 :term:`chunk 范围:

sh.status()

方法 sh.status() 的输出如下:

--- Sharding Status ---
sharding version: {
     "_id" : 1,
     "version" : 4,
     "minCompatibleVersion" : 4,
     "currentVersion" : 5,
     "clusterId" : ObjectId("5260032c901f6712dcd8f400")
}
shards:
     {  "_id" : "shard0000",  "host" : "localhost:30000" }
     {  "_id" : "shard0001",  "host" : "localhost:30001" }
  databases:
     {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
     {  "_id" : "test",  "partitioned" : true,  "primary" : "shard0001" }
             test.users
                     shard key: { "username" : 1 }
                     chunks:
                             shard0000       7
                             shard0001       7
                     { "username" : { "$minKey" : 1 } } -->> { "username" : "user16643" } on : shard0000 Timestamp(2, 0)
                     { "username" : "user16643" } -->> { "username" : "user2329" } on : shard0000 Timestamp(3, 0)
                     { "username" : "user2329" } -->> { "username" : "user29937" } on : shard0000 Timestamp(4, 0)
                     { "username" : "user29937" } -->> { "username" : "user36583" } on : shard0000 Timestamp(5, 0)
                     { "username" : "user36583" } -->> { "username" : "user43229" } on : shard0000 Timestamp(6, 0)
                     { "username" : "user43229" } -->> { "username" : "user49877" } on : shard0000 Timestamp(7, 0)
                     { "username" : "user49877" } -->> { "username" : "user56522" } on : shard0000 Timestamp(8, 0)
                     { "username" : "user56522" } -->> { "username" : "user63169" } on : shard0001 Timestamp(8, 1)
                     { "username" : "user63169" } -->> { "username" : "user69816" } on : shard0001 Timestamp(1, 8)
                     { "username" : "user69816" } -->> { "username" : "user76462" } on : shard0001 Timestamp(1, 9)
                     { "username" : "user76462" } -->> { "username" : "user83108" } on : shard0001 Timestamp(1, 10)
                     { "username" : "user83108" } -->> { "username" : "user89756" } on : shard0001 Timestamp(1, 11)
                     { "username" : "user89756" } -->> { "username" : "user96401" } on : shard0001 Timestamp(1, 12)
                     { "username" : "user96401" } -->> { "username" : { "$maxKey" : 1 } } on : shard0001 Timestamp(1, 13)

如下所示,数据块范围显示在数据块在每个分片的数量之后:

** 数据块数量: **

chunks:
        shard0000       7
        shard0001       7

** 数据块范围: **

{ "username" : "user36583" } -->> { "username" : "user43229" } on : shard0000 Timestamp(6, 0)

确认一个数据块是空的

The mergeChunks command requires at least one empty input chunk. To check the size of a chunk, use the dataSize command in the sharded collection’s database. For example, the following checks the amount of data in the chunk for the users collection in the test database:

重要

You must use the use <db> helper to switch to the database containing the sharded collection before running the dataSize command.

use test
db.runCommand({
   "dataSize": "test.users",
   "keyPattern": { username: 1 },
   "min": { "username": "user36583" },
   "max": { "username": "user43229" }
})

如果传递给 dataSize 的数据块是空的,这个命令的输出类似如下:

{ "size" : 0, "numObjects" : 0, "millis" : 0, "ok" : 1 }

合并数据块

使用以下命令,合并存在于同一个 shard 上且至少一个为空数据块的两个数据块:

db.runCommand( { mergeChunks: "test.users",
                 bounds: [ { "username": "user68982" },
                           { "username": "user95197" } ]
             } )

成功时, mergeChunks 返回如下输出:

{ "ok" : 1 }

因为任何情况失败, mergeChunks 返回的文档中, ok 子段都为 0 .

查看合并之后数据块的范围

在合并完所有空数据块之后,使用以下命令确认新数据块生效:

sh.status()

The output of sh.status() should resemble:

--- Sharding Status ---
sharding version: {
     "_id" : 1,
     "version" : 4,
     "minCompatibleVersion" : 4,
     "currentVersion" : 5,
     "clusterId" : ObjectId("5260032c901f6712dcd8f400")
}
shards:
     {  "_id" : "shard0000",  "host" : "localhost:30000" }
     {  "_id" : "shard0001",  "host" : "localhost:30001" }
  databases:
     {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
     {  "_id" : "test",  "partitioned" : true,  "primary" : "shard0001" }
             test.users
                     shard key: { "username" : 1 }
                     chunks:
                             shard0000       2
                             shard0001       2
                     { "username" : { "$minKey" : 1 } } -->> { "username" : "user16643" } on : shard0000 Timestamp(2, 0)
                     { "username" : "user16643" } -->> { "username" : "user56522" } on : shard0000 Timestamp(3, 0)
                     { "username" : "user56522" } -->> { "username" : "user96401" } on : shard0001 Timestamp(8, 1)
                     { "username" : "user96401" } -->> { "username" : { "$maxKey" : 1 } } on : shard0001 Timestamp(1, 13)