翻译或纠错本页面

Journaling

当journaling开启后,MongoDB在定义好的 dbPath 路径下创建一个journal子目录,dbPath默认路径为 /data/db 。journal目录用来存放journal文件,该文件用来记录write-ahead redo日志。该目录下还包含一个用来保存最近队列数的文件。一次正常的shutdown会删除journal目录下的所有文件,而非正常的shutdown(比如崩溃)则不会删除文件。当mongod进程重启时,这些文件用来自动恢复数据库保证数据的一致性。

Journaling and the WiredTiger Storage Engine

重要

Journal文件是只追加文件,文件名以 j._ 开头。当journal文件达到1G数据时,MongoDB会创建一个新的journal文件。一旦某个journal文件的所有写操作都被刷新到数据库数据文件之后,MongoDB将删除掉这个文件,因为以后都不会再用该文件来进行数据恢复了。除非你每秒进行大量数据的写入,否则journal目录里应该只会有两三个文件。

WiredTiger uses checkpoints to provide a consistent view of data on disk and allow MongoDB to recover from the last checkpoint. However, if MongoDB exits unexpectedly in between checkpoints, journaling is required to recover information that occurred after the last checkpoint.

With journaling, the recovery process:

  1. Looks in the data files to find the identifier of the last checkpoint.
  2. Searches in the journal files for the record that matches the identifier of the last checkpoint.
  3. Apply the operations in the journal files since the last checkpoint.

为了提高当前journal文件频繁的顺序写入速度,您可以将journal文件放在与数据库数据文件不同的文件系统下。

在 3.2 版更改.

如果将journal文件放在与数据文件不同的文件系统下,那么将 不能 单独使用文件系统快照备份 dbPath 目录下的文件。在这种情况下,利用 fsyncLock() 方法来确保数据库文件的一致性,等快照生成完毕之后使用 fsyncUnlock() 方法释放锁定。

根据你文件系统的不同,当第一次以journaling方式启动一个 mongod 进程时,可能会因为预分配而产生延时。

WiredTiger syncs the buffered journal records to disk according to the following intervals or conditions:

  • 3.2 新版功能: 如果 mongod 进程认为预分配journal文件比在需要时去创建一个新的journal文件更有效,那么MongoDB会预先分配journal文件。预分配所需要的时间可能需要几分钟,在这段时间里是连接不了数据库的。这是一次性预分配,在以后的调用中都不会发生再分配。

  • 为了避免预分配带来的延时,可以参见 Avoid Preallocation Lag for MMAPv1

  • If the write operation includes a write concern of j: true, WiredTiger forces a sync of the WiredTiger journal files.

  • Because MongoDB uses a journal file size limit of 100 MB, WiredTiger creates a new journal file approximately every 100 MB of data. When WiredTiger creates a new journal file, WiredTiger syncs the previous journal file.

重要

In between write operations, while the journal records remain in the WiredTiger buffers, updates can be lost following a hard shutdown of mongod.

参见

The serverStatus command returns information on the WiredTiger journal statistics in the wiredTiger.log field.

Journal Files

journal是一个硬盘视图,用来存储MongoDB将写操作应用到 private view 之后数据文件之前的写操作。journal文件提供健壮性。如果 mongod 进程在将写操作数据写入数据文件之前崩溃,journal文件可以重新将写操作数据应用到 shared view ,最后再加载到数据文件。

Journal files contain a record per each write operation. Each record has a unique identifier.

MongoDB configures WiredTiger to use snappy compression for the journaling data.

MongoDB将写操作批量复制到journal文件,这种方式成为批量提交。 “group commits” 可以降低journaling机制对性能的影响,因为批量提交在提交时阻塞所有的写操作。关于默认的提交时间间隔参见 commitIntervalMs

WiredTiger journal files for MongoDB have a maximum size limit of approximately 100 MB. Once the file exceeds that limit, WiredTiger creates a new journal file.

WiredTiger automatically removes old journal files to maintain only the files needed to recover from last checkpoint.

WiredTiger will pre-allocate journal files.

Journaling and the MMAPv1 Storage Engine

With MMAPv1, when a write operation occurs, MongoDB updates the in-memory view. With journaling enabled, MongoDB writes the in-memory changes first to on-disk journal files. If MongoDB should terminate or encounter an error before committing the changes to the data files, MongoDB can use the journal files to apply the write operation to the data files and maintain a consistent state.

为了提高当前journal文件频繁的顺序写入速度,您可以将journal文件放在与数据库数据文件不同的文件系统下。

write operations 发生时,MongoDB将数据写入内存的 private view ,然后批量复制写操作到journal文件。journal文件将这些操作存储到磁盘保证持久性。每条journal都描述了写操作在数据文件里发生改变的具体地址。

  1. MongoDB first applies write operations to the private view.
  2. MongoDB then applies the changes in the private view to the on-disk journal files in the journal directory roughly every 100 milliseconds. MongoDB records the write operations to the on-disk journal files in batches called group commits. Grouping the commits help minimize the performance impact of journaling since these commits must block all writers during the commit. Writes to the journal are atomic, ensuring the consistency of the on-disk journal files. For information on the frequency of the commit interval, see storage.journal.commitIntervalMs.
  3. 然后MongoDB将journal文件里的写操作应用到 shared view 。这时, shared view 将与数据文件中的数据不一致。

  4. 在默认的60秒间隔内,MongoDB通过操作系统将 shared view 里的数据刷新到磁盘上。这保证了数据文件里的数据与最新写操作数据是一致的。操作系统可以选择高于60秒的频率刷新 shared view ,特别是当系统的空闲内存较少时。

If the mongod instance were to crash without having applied the writes to the data files, the journal could replay the writes to the shared view for eventual write to the data files.

When MongoDB flushes write operations to the data files, MongoDB notes which journal writes have been flushed. Once a journal file contains only flushed writes, it is no longer needed for recovery and MongoDB can recycle it for a new journal file.

Once the journal operations have been applied to the shared view and flushed to disk (i.e. pages in the shared view and private view are in sync), MongoDB asks the operating system to remap the shared view to the private view in order to save physical RAM. MongoDB routinely asks the operating system to remap the shared view to the private view in order to save physical RAM. Upon a new remapping, the operating system knows that physical memory pages can be shared between the shared view and the private view mappings.

注解

The interaction between the shared view and the on-disk data files is similar to how MongoDB works without journaling. Without journaling, MongoDB asks the operating system to flush in-memory changes to the data files every 60 seconds.

Journal Files

当MongoDB刷新journal文件的写操作到数据文件时,会记录哪些journal写操作已经被刷新过。一旦journal文件中只包含被刷新过的写操作时,这个文件就不会再起到恢复数据的作用,MongoDB会删除它,或者将其回收用作新的journal文件

作为整个journaling机制的一部分,MongoDB会照常请求操作系统将 shared view 重新映射到 private view ,以节省物理内存。这一次的重新映射,操作系统知道将物理内存页共享在 shared viewprivate view 映射之间。

The lsn file contains the last time MongoDB flushed the changes to the data files.

Once MongoDB applies all the write operations in a particular journal file to the data files, MongoDB can recycle it for a new journal file.

Unless you write many bytes of data per second, the journal directory should contain only two or three journal files.

A clean shutdown removes all the files in the journal directory. A dirty shutdown (crash) leaves files in the journal directory; these are used to automatically recover the database to a consistent state when the mongod process is restarted.

Journal Directory

To speed the frequent sequential writes that occur to the current journal file, you can ensure that the journal directory is on a different filesystem from the database data files.

重要

If you place the journal on a different filesystem from your data files, you cannot use a filesystem snapshot alone to capture valid backups of a dbPath directory. In this case, use fsyncLock() to ensure that database files are consistent before the snapshot and fsyncUnlock() once the snapshot is complete.

Preallocation Lag

MongoDB may preallocate journal files if the mongod process determines that it is more efficient to preallocate journal files than create new journal files as needed.

Depending on your filesystem, you might experience a preallocation lag the first time you start a mongod instance with journaling enabled. The amount of time required to pre-allocate files might last several minutes; during this time, you will not be able to connect to the database. This is a one-time preallocation and does not occur with future invocations.

To avoid preallocation lag, see Avoid Preallocation Lag for MMAPv1.

Journaling and the In-Memory Storage Engine

Starting in MongoDB Enterprise version 3.2.6, the In-Memory Storage Engine is part of general availability (GA). Because its data is kept in memory, there is no separate journal. Write operations with a write concern of j: true are immediately acknowledged.

If any voting member of a replica set runs without journaling (i.e. either runs an in-memory storage engine or runs with journaling disabled), you must set writeConcernMajorityJournalDefault to false.

With writeConcernMajorityJournalDefault set to false, MongoDB will not wait for w: "majority" writes to be durable before acknowledging the writes. As such, "majority" write operations could possibly roll back in the event of a loss of a replica set member.