翻译或纠错本页面

复制集选举

On this page

Replica sets use elections to determine which set member will become primary. Elections occur after initiating a replica set, and also any time the primary becomes unavailable. The primary is the only member in the set that can accept write operations. If a primary becomes unavailable, elections allow the set to recover normal operations without manual intervention. In the following three-member replica set, the primary is unavailable. One of the remaining secondaries holds an election to elect itself as a new primary.

Diagram of an election of a new primary. In a three member replica set with two secondaries, the primary becomes unreachable. The loss of a primary triggers an election where one of the secondaries becomes the new primary

对于复制集而言,选举是相对独立的操作,但是也需要时间来全部完成的。当选举开始的时候,复制集中没有主节点也不能接收处理写请求。除非遇到必要的情况,Mongodb是尽量不进行选举的。

If a majority of the replica set is inaccessible or unavailable to the current primary, the primary will step down and become a secondary. The replica set cannot accept writes after this occurs, but remaining members can continue to serve read queries if such queries are configured to run on secondaries.

Factors and Conditions that Affect Elections

复制集成员每两秒向复制集中其他成员进行心跳检测。如果某个节点在10秒内没有返回,那么它将被标记为不可用。

3.2 新版功能: 优先级为0的节点将不能成为主节点,也不会发起选举。参见 优先级为0的复制集成员 获得更多信息。

Heartbeats

Replica set members send heartbeats (pings) to each other every two seconds. If a heartbeat does not return within 10 seconds, the other members mark the delinquent member as inaccessible.

如果复制集中的某个节点不能连接上其他多数节点,那么它将不能升职为主节点。在选举中,多数是指多数 投票 而不是多数节点个数。

After a replica set has a stable primary, the election algorithm will make a “best-effort” attempt to have the secondary with the highest priority available call an election. Member priority affects both the timing and the outcome of elections; secondaries with higher priority call elections relatively sooner than secondaries with lower priority, and are also more likely to win. However, a lower priority instance can be elected as primary for brief periods, even if a higher priority secondary is available. Replica set members continue to call elections until the highest priority member available becomes primary.

如果复制集是由三个节点组成的,且三个节点均可投票,只要其中两个节点能够互相沟通那么复制集就能选举出新的主节点。如果有两个节点不可用了,那么剩下的节点将为 从节点 ,因为它不能与复制集中多数节点进行沟通。 如果两个从节点不可用了,剩下的 主节点 将降职为从节点。

Loss of a Data Center

网络隔离影响了选举中多数选票的结构。如果主节点不可用了,且每个相互隔离的网络中都没有多数选票的出现,那么复制集将 不会 选举出新的主节点。复制集将变为只读的。

If possible, distribute the replica set members across data centers to maximize the likelihood that even with a loss of a data center, one of the remaining replica set members can become the new primary.

一个从节点无法与主节点进行连接。当从节点们无法与主节点进行沟通的时候将会触发选举。

A network partition may segregate a primary into a partition with a minority of nodes. When the primary detects that it can only see a minority of nodes in the replica set, the primary steps down as primary and becomes a secondary. Independently, a member in the partition that can communicate with a majority of the nodes (including itself) holds an election to become the new primary.

Vetoes in Elections

在 3.2 版更改: The protocolVersion: 1 obviates the need for vetos. The following veto discussion applies to replica sets that use the older protocolVersion: 0.

For replica sets using protocolVersion: 0, all members of a replica set can veto an election, including non-voting members. A member will veto an election:

  • If the member seeking an election is not a member of the voter’s set.
  • If the current primary has more recent operations (i.e. a higher optime) than the member seeking election, from the perspective of another voting member.
  • 有些情况下,在我们需要修改一些复制集配置的时候会触发选举,导致主节点辞职。

  • 当主节点辞职后,它将关闭所有已经建立的连接来确保客户端不会在从节点中进行写操作。这将对客户端对复制集的架构获取与防止 回滚 提供帮助。

  • If the member seeking an election has a lower priority than another member in the set that is also eligible for election.
[1]

每个复制集的节点都有一个优先级来在选举中确认谁更适合做 主节点 。在选举中,复制集选举出一个合格的具有最高 priority 的节点作为新的主节点。所有节点默认的优先级都是 1,都有相同的机会来成为主节点。默认情况下,每个节点都可以触发选举。

我们可以通过设定 priority 来加重某个或者某些特殊的节点在选举中获得选票的优先级。比如,当我们有一个 异地分布式架构的复制集 ,我们可以通过设置优先级来使只有特定数据中心中的节点能够升职为主节点。

The replica set member configuration setting members[n].votes and member state determine whether a member votes in an election.

  • 节点的 state 也讲决定其是否能够进行投票。只有在以下状态的节点,才能参与投票: PRIMARY(主节点) , SECONDARY(从节点) , RECOVERING(恢复中) , ARBITER(投票节点)ROLLBACK(回滚)

  • Only voting members in the following states are eligible to vote:
    • 每个复制集中的成员都可以否决选举,包括 不参与投票的节点 。在以下情况中,复制集中的节点会否决选举:

    • SECONDARY
    • RECOVERING
    • 当发起选举的节点的优先级比复制集中其他某个同样合格的节点的优先级低的时候。

    • ROLLBACK

优先级为0的节点 [] 是复制集中可用节点中数据最新的节点。这种情况下,另一个复制集中合格的节点将会追上该节点的数据并尝试升职为主节点。

当当前的主节点比发起选举的节点拥有更新的数据 (i.e. 更高的 optime ) 的时候。

不参与投票的节点也拥有复制集的数据集副本,且可以接受请求。不参与投票的节点将不在选举中投票,但是 可以 否决 选举,也可以升职为主节点。

由于复制集可以最多拥有12个节点但是却只能有7个节点参与投票,不参与投票节点的存在就使得复制集可以拥有超过7个节点。

Diagram of a 9 member replica set with the maximum of 7 voting members.

下述这样的拥有9个节点的复制集中就包含了7个参与投票的节点和2各不参与投票的节点。

{
  "_id" : <num>
  "host" : <hostname:port>,
  "votes" : 0
}

重要

不参与投票的节点的 votes 设置是 0

To configure a non-voting member, see 配置一个不参与投票的节点.