翻译或纠错本页面

复制集Oplog

The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primary’s oplog. The secondary members then copy and apply these operations in an asynchronous process. All replica set members contain a copy of the oplog, in the local.oplog.rs collection, which allows them to maintain the current state of the database.

为了提高复制的效率,复制集中所有节点之间会互相进行心跳检测(通过ping)。每个节点都可以从任何其他节点上获取oplog。

Each operation in the oplog is idempotent. That is, oplog operations produce the same results whether applied once or multiple times to the target dataset.

Oplog大小

大多数情况下,默认的oplog大小是足够的。举个例子,如果Oplog是大小是可用空间的5%,且可以存储24小时内的操作,那么从节点就可以在停止复制24小时后仍能追赶上主节点,而不需要重新获取全部数据。然而,大多数复制集中的操作没有那么频繁,oplog可以存放远不止上述的时间的操作记录。

mongod 建立oplog之前,我们可以通过设置 oplogSizeMB 选项来设定其大小。但是,如果已经初始化过复制集,已经建立了Oplog了,我们需要通过 修改Oplog大小 中的方式来修改其大小。

The default oplog size depends on the storage engine:

如果我能够对我复制集的工作情况有一个很好地预估,如果可能会出现以下的情况,那么我们就可能需要创建一个比默认大小更大的oplog。相反的,如果我们的应用主要是读,而写操作很少,那么一个小一点的oplog就足够了。

Default Oplog Size Lower Bound

Oplog为了保证 幂等性 会将多项更新(multi-updates)转换为一条条单条的操作记录。这就会在数据没有那么多变动的情况下大量的占用oplog空间。

内存存储引擎

如果我们删除了与我们插入时同样多的数据,数据库将不会在硬盘使用情况上有显著提升,但是oplog的增长情况会显著提升。

50 MB

如果我们会有大量的in-place更新,数据库会记录下大量的操作记录,但此时硬盘中数据量不会有所变化。

WiredTiger Storage Engine

我们可以通过 rs.printReplicationInfo() 来查看oplog的状态,包括大小、存储的操作的时间范围。关于oplog的更多信息可以参考 Check the Size of the Oplog

在各类异常情况下, 从节点 oplog的更新可能落后于主节点一些时间。在从节点上通过 db.getReplicationInfo()db.getReplicationInfo 可以获得现在复制集的状态与,也可以知道是否有意外的复制延时。

如果我们会有大量的in-place更新,数据库会记录下大量的操作记录,但此时硬盘中数据量不会有所变化。

MMAPv1 Storage Engine

我们可以通过 rs.printReplicationInfo() 来查看oplog的状态,包括大小、存储的操作的时间范围。关于oplog的更多信息可以参考 Check the Size of the Oplog

在各类异常情况下, 从节点 oplog的更新可能落后于主节点一些时间。在从节点上通过 db.getReplicationInfo()db.getReplicationInfo 可以获得现在复制集的状态与,也可以知道是否有意外的复制延时。

如果我们会有大量的in-place更新,数据库会记录下大量的操作记录,但此时硬盘中数据量不会有所变化。

For 64-bit OS X systems

The default oplog size is 192 MB of either physical memory or free disk space depending on the storage engine:

如果我能够对我复制集的工作情况有一个很好地预估,如果可能会出现以下的情况,那么我们就可能需要创建一个比默认大小更大的oplog。相反的,如果我们的应用主要是读,而写操作很少,那么一个小一点的oplog就足够了。

Default Oplog Size
内存存储引擎 192 MB of physical memory
WiredTiger Storage Engine 192 MB of free disk space
MMAPv1 Storage Engine 192 MB of free disk space

In most cases, the default oplog size is sufficient. For example, if an oplog is 5% of free disk space and fills up in 24 hours of operations, then secondaries can stop copying entries from the oplog for up to 24 hours without becoming too stale to continue replicating. However, most replica sets have much lower operation volumes, and their oplogs can hold much higher numbers of operations.

Before mongod creates an oplog, you can specify its size with the oplogSizeMB option. However, after you have started a replica set member for the first time, you can only change the size of the oplog using the 修改Oplog大小 procedure.

Workloads that Might Require a Larger Oplog Size

If you can predict your replica set’s workload to resemble one of the following patterns, then you might want to create an oplog that is larger than the default. Conversely, if your application predominantly performs reads with a minimal amount of write operations, a smaller oplog may be sufficient.

The following workloads might require a larger oplog size.

Updates to Multiple Documents at Once

The oplog must translate multi-updates into individual operations in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in data size or disk use.

Deletions Equal the Same Amount of Data as Inserts

If you delete roughly the same amount of data as you insert, the database will not grow significantly in disk use, but the size of the operation log can be quite large.

Significant Number of In-Place Updates

If a significant portion of the workload is updates that do not increase the size of the documents, the database records a large number of operations but does not change the quantity of data on disk.

Oplog Status

To view oplog status, including the size and the time range of operations, issue the rs.printReplicationInfo() method. For more information on oplog status, see Check the Size of the Oplog.

Under various exceptional situations, updates to a secondary’s oplog might lag behind the desired performance time. Use db.getReplicationInfo() from a secondary member and the replication status output to assess the current state of replication and determine if there is any unintended replication delay.

See Replication Lag for more information.