- 分片 >
- Sharded Cluster Administration >
- Data Partitioning with Chunks >
- 在集群中合并数据块
在集群中合并数据块¶
概述¶
mergeChunks 方法允许你将空的数据块合并到同分片相邻的数据块中.如果在设定的片键范围内没有数据,这个数据块就是空的.
空数据块在几种情况下会发生,包括:
如果 预分裂 创建了过多数据块,数据在数据块间的分布可能不均衡:
如果你删除了集群中很多数据,有些数据块可能不会包含数据.
这篇教程解释了怎样找到可以合并的数据块,怎样将数据块与相邻的数据块合并.
过程¶
注解
示例中使用 test 数据库的 users 集合,这个集合使用 username 作为片键.
确认数据块范围¶
在 program:mongo`终端中,使用以下操作确认 :term:`chunk 范围:
sh.status()
方法 sh.status() 的输出如下:
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("5260032c901f6712dcd8f400")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:30000" }
{ "_id" : "shard0001", "host" : "localhost:30001" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }
test.users
shard key: { "username" : 1 }
chunks:
shard0000 7
shard0001 7
{ "username" : { "$minKey" : 1 } } -->> { "username" : "user16643" } on : shard0000 Timestamp(2, 0)
{ "username" : "user16643" } -->> { "username" : "user2329" } on : shard0000 Timestamp(3, 0)
{ "username" : "user2329" } -->> { "username" : "user29937" } on : shard0000 Timestamp(4, 0)
{ "username" : "user29937" } -->> { "username" : "user36583" } on : shard0000 Timestamp(5, 0)
{ "username" : "user36583" } -->> { "username" : "user43229" } on : shard0000 Timestamp(6, 0)
{ "username" : "user43229" } -->> { "username" : "user49877" } on : shard0000 Timestamp(7, 0)
{ "username" : "user49877" } -->> { "username" : "user56522" } on : shard0000 Timestamp(8, 0)
{ "username" : "user56522" } -->> { "username" : "user63169" } on : shard0001 Timestamp(8, 1)
{ "username" : "user63169" } -->> { "username" : "user69816" } on : shard0001 Timestamp(1, 8)
{ "username" : "user69816" } -->> { "username" : "user76462" } on : shard0001 Timestamp(1, 9)
{ "username" : "user76462" } -->> { "username" : "user83108" } on : shard0001 Timestamp(1, 10)
{ "username" : "user83108" } -->> { "username" : "user89756" } on : shard0001 Timestamp(1, 11)
{ "username" : "user89756" } -->> { "username" : "user96401" } on : shard0001 Timestamp(1, 12)
{ "username" : "user96401" } -->> { "username" : { "$maxKey" : 1 } } on : shard0001 Timestamp(1, 13)
如下所示,数据块范围显示在数据块在每个分片的数量之后:
** 数据块数量: **
chunks: shard0000 7 shard0001 7
** 数据块范围: **
{ "username" : "user36583" } -->> { "username" : "user43229" } on : shard0000 Timestamp(6, 0)
确认一个数据块是空的¶
The mergeChunks command requires at least one empty input chunk. To check the size of a chunk, use the dataSize command in the sharded collection’s database. For example, the following checks the amount of data in the chunk for the users collection in the test database:
重要
You must use the use <db> helper to switch to the database containing the sharded collection before running the dataSize command.
use test
db.runCommand({
"dataSize": "test.users",
"keyPattern": { username: 1 },
"min": { "username": "user36583" },
"max": { "username": "user43229" }
})
如果传递给 dataSize 的数据块是空的,这个命令的输出类似如下:
{ "size" : 0, "numObjects" : 0, "millis" : 0, "ok" : 1 }
合并数据块¶
使用以下命令,合并存在于同一个 shard 上且至少一个为空数据块的两个数据块:
db.runCommand( { mergeChunks: "test.users",
bounds: [ { "username": "user68982" },
{ "username": "user95197" } ]
} )
成功时, mergeChunks 返回如下输出:
{ "ok" : 1 }
因为任何情况失败, mergeChunks 返回的文档中, ok 子段都为 0 .
查看合并之后数据块的范围¶
在合并完所有空数据块之后,使用以下命令确认新数据块生效:
sh.status()
The output of sh.status() should resemble:
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("5260032c901f6712dcd8f400")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:30000" }
{ "_id" : "shard0001", "host" : "localhost:30001" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }
test.users
shard key: { "username" : 1 }
chunks:
shard0000 2
shard0001 2
{ "username" : { "$minKey" : 1 } } -->> { "username" : "user16643" } on : shard0000 Timestamp(2, 0)
{ "username" : "user16643" } -->> { "username" : "user56522" } on : shard0000 Timestamp(3, 0)
{ "username" : "user56522" } -->> { "username" : "user96401" } on : shard0001 Timestamp(8, 1)
{ "username" : "user96401" } -->> { "username" : { "$maxKey" : 1 } } on : shard0001 Timestamp(1, 13)