排除集群故障¶
On this page
- Application Servers or mongos Instances Become Unavailable
- A Single mongod Becomes Unavailable in a Shard
- All Members of a Shard Become Unavailable
- A Config Server Replica Set Member Become Unavailable
- 因为过时的配置数据导致游标失效
- Shard Keys and Cluster Availability
- Config Database String Error
- 迁移配置服务器时避免宕机时间
- moveChunk commit failed Error
这一节描述了排除部署 sharded cluster 常见故障的策略.
因为过时的配置数据导致游标失效¶
在 mongos 没有从 config database 及时更新自己缓存的集群元信息时查询会返回如下警告:
could not initialize cursor across all shards because : stale config detected
这个警告 不应该 通知你的应用方,在所有的 mongos 更新自己的缓存之前,错误会一直重复,可以运行 flushRouterConfig 强制更新缓存.
Shard Keys and Cluster Availability¶
The most important consideration when choosing a shard key are:
- to ensure that MongoDB will be able to distribute data evenly among shards, and
- to scale writes across the cluster, and
- to ensure that mongos can isolate most queries to a specific mongod.
Furthermore:
- Each shard should be a replica set, if a specific mongod instance fails, the replica set members will elect another to be primary and continue operation. However, if an entire shard is unreachable or fails for some reason, that data will be unavailable.
- If the shard key allows the mongos to isolate most operations to a single shard, then the failure of a single shard will only render some data unavailable.
- If your shard key distributes data required for every operation throughout the cluster, then the failure of the entire shard will render the entire cluster unavailable.
In essence, this concern for reliability simply underscores the importance of choosing a shard key that isolates query operations to a single shard.
Config Database String Error¶
在 3.2 版更改.
Starting in MongoDB 3.2, config servers can be deployed as replica sets. The mongos instances for the sharded cluster must specify the same config server replica set name but can specify hostname and port of different members of the replica set.
Starting in 3.4, the use of the deprecated mirrored mongod instances as config servers (SCCC) is no longer supported. Before you can upgrade your sharded clusters to 3.4, you must convert your config servers from SCCC to CSRS.
To convert your config servers from SCCC to CSRS, see Upgrade Config Servers to Replica Set.
With earlier versions of MongoDB sharded clusters that use the topology of three mirrored mongod instances for config servers, mongos instances in a sharded cluster must specify identical configDB string.
迁移配置服务器时避免宕机时间¶
Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers without downtime.
moveChunk commit failed Error¶
At the end of a chunk migration, the shard must connect to the config database to update the chunk’s record in the cluster metadata. If the shard fails to connect to the config database, MongoDB reports the following error:
ERROR: moveChunk commit failed: version is at <n>|<nn> instead of
<N>|<NN>" and "ERROR: TERMINATING"
When this happens, the primary member of the shard’s replica set then terminates to protect data consistency. If a secondary member can access the config database, data on the shard becomes accessible again after an election.
The user will need to resolve the chunk migration failure independently. If you encounter this issue, contact the MongoDB User Group or MongoDB Support to address this issue.