集群的高可用性
A production cluster
has no single point of failure. This section introduces the
availability concerns for MongoDB deployments in general and
highlights potential failure scenarios and available resolutions.
If each application server has its own mongos instance, other
application servers can continue to access the database. Furthermore,
mongos instances do not maintain persistent state, and they
can restart and become unavailable without losing any state or data.
When a mongos instance starts, it retrieves a copy of the
config database and can begin routing queries.
一个分片中的一个 mongod 不可用
Replica sets provide high availability for shards.
If the unavailable mongod is a primary, then the
replica set will elect a new primary. If
the unavailable mongod is a secondary, and it
disconnects the primary and secondary will continue to hold all data. In
a three member replica set, even if a single member of the set
experiences catastrophic failure, two other members have full copies of
the data.
Always investigate availability interruptions and failures. If a system
is unrecoverable, replace it and create a new member of the replica set
as soon as possible to replace the lost redundancy.
All Members of a Shard Become Unavailable
If all members of a replica set shard are unavailable, all data
held in that shard is unavailable. However, the data on all other shards
will remain available, and it is possible to read and write data to the
other shards. However, your application must be able to deal with
partial results, and you should investigate the cause of the
interruption and attempt to recover the shard as soon as possible.
A Config Server Replica Set Member Become Unavailable
在 3.2 版更改: Starting in MongoDB 3.2, config servers for sharded clusters can be
deployed as a replica set. The
replica set config servers must run the WiredTiger storage engine. MongoDB 3.2 deprecates the use of three mirrored
mongod instances for config servers.
Replica sets provide high availability for the
config servers. If an unavailable config server is a primary,
then the replica set will elect a new
primary.
If the replica set config server loses its primary and cannot elect a
primary, the cluster’s metadata becomes read only. You can still read
and write data from the shards, but no chunk migration or chunk splits will occur until a primary
is available. If all config databases become unavailable, the cluster can
become inoperable.
注解
在初始化一个集群时,所有的配置服务器都必须正常运行并且可访问.
Renaming Mirrored Config Servers and Cluster Availability
If the sharded cluster is using mirrored config servers instead of a
replica set and the name or address that a sharded cluster uses to
connect to a config server changes, you must restart every
mongod and mongos instance in the sharded cluster.
Avoid downtime by using CNAMEs to identify config servers within the
MongoDB deployment.
To avoid downtime when renaming config servers, use DNS names
unrelated to physical or virtual hostnames to refer to your
config servers.
Generally, refer to each config server using the DNS alias (e.g. a
CNAME record). When specifying the config server connection string to
mongos, use these names. These records make it possible to
change the IP address or rename config servers without changing the
connection string and without having to restart the entire cluster.
片键和集群可用性
The most important consideration when choosing a shard key
are:
- to ensure that MongoDB will be able to distribute data evenly among
shards, and
有较好的写扩展,同时
- to ensure that mongos can isolate most queries to a specific
mongod.
此外:
- Each shard should be a replica set, if a specific
mongod instance fails, the replica set members will elect
another to be primary and continue operation. However, if an
entire shard is unreachable or fails for some reason, that data will
be unavailable.
- If the shard key allows the mongos to isolate most
operations to a single shard, then the failure of a single shard
will only render some data unavailable.
- If your shard key distributes data required for every operation
throughout the cluster, then the failure of the entire shard will
render the entire cluster unavailable.
In essence, this concern for reliability simply underscores the
importance of choosing a shard key that isolates query operations to a
single shard.