MongoDB just like any other database may fail when executing a write operation. In that case we need a strategy that will keep the operation somewhere so that the database can resume when it is restored back to operation.
In MongoDB we use journaling whereby there is a write ahead logging to on-disk journal files to keep the data available in an event of failure. The WiredTiger storage engine can use checkpoints to provide a consistent view of data on disk and allow MongoDB to recover from the last checkpoint but only if it did not exit unexpectedly. Otherwise, for the information that occurred during the last checkpoint, journaling must have been enabled to recover such data.
The procedure for the recovery process is that: the database will look into the data files to find the identifier of the last checkpoint, use this identifier to search in the journal files for the record that matches it and then apply the operations in the journal files since the last checkpoint.
How Journaling Works in the WiredTiger Storage Engine
For every client who initiates a write operation, the WiredTiger creates a journal record that is composed of internal write operations that were triggered by the initial write. Consider a document in a collection that is to be updated and we expect its index to be modified too. The WiredTiger will create a single journal record that will incorporate the update operation and corresponding index modifications.
This record will be stored in an in-memory buffer whose maximum capacity is 128kB. The storage engine then syncs this buffered journal records to disk when either of the following is met:
- A write operation includes/implies a write concern of j: true.
- WiredTiger creates a new journal file which is after every 100MB of data.
- After every 100 milliseconds depending on the storage.journal.commitIntervalMs.
- In case of replica set members:
- Instance of operations waiting for oplog entries i.e read operations performed as part of causally consistent sessions and forward scanning queries against the oplog.
- After every batch application of the oplog entries in case of the secondary members.
In case of a hard shutdown of mongod, if write operations were in process, updates can be lost even if the journal records remain in the WiredTiger buffers.
Journal Data Compression
Default setting in MongoDB directs the WiredTiger to use snappy compression for the journal data. This can be changed depending on which compression algorithm you may want using the storage.wiredTiger.engineConfig.journalCompressor setting. These log records are only compressed if their size is greater than 128 bytes, which is the minimum log record size of the WiredTiger.
Limiting the Size of a Journal File
The maximum size of a journal file is 100 MB and therefore if the file exceeds this limit, a new one will be created.
After the journal file has been used in recovery or rather there are files older than the one that can be used to recover from the last checkpoint, the WiredTiger automatically removes them.
Pre-Allocation
Journal files can be pre-allocated with the WiredTiger storage engine if the mongod process determines that it is more efficient to preallocate journal files than create new ones.
How Journaling works in the In-Memory Storage Engine
The In-memory storage Engine was stated as part of the General availability (GA) starting with the MongoDB Enterprise version 3.2.6. With this storage engine, data is kept in memory hence no separate journaling technique. If there are any write operations with a write concern (j: true) they will be immediately acknowledged.
For a replica set with a voting member using the in-memory storage engine, one must set the writeConcernMajorityJournalDefault to false. Otherwise if this is set to true, the replica set will log a startup warning.
When this option is set to false, the database will not wait for w: “majority” write to be written to the on-disk journal before acknowledging the writes. The disadvantage of this approach is that with majority write operations may roll back in the event of a transient loss (such as restart or crash) of a majority of nodes in a given replica set.
If using the MMapv1 storage engine, journal pre-allocation can be disabled using --nopreallocation option when starting the mongod.
With the WiredTiger storage engine, from MongoDB version 4.0 upwards, it is not possible to specify --nojournal option or even the storage.journal.enabled: false for replica set members using the WiredTiger storage engine.
Managing Journaling
Disabling Journaling
Journaling can only be disbled for standalone deployments and it is not recommended for production systems. For MongoDB version 4.0 upwards, one cannot specify neither the --nojournal option nor storage.journal.enabled: false when replica set members that use WiredTiger storage engine are involved.
To disable journaling start mongod with the --nojournal command line option.
Monitor the Journal Status
To get the statistics on the journal use the command db.serverStatus() which returns wiredTiger.log.
Get Commit Acknowledgement
We use the write concern with j option to get commit acknowledgement. {j: true}. Journaling must be enabled in this case otherwise the mongod instance may produce an error.
If journaling is enabled, w: “majority” this may imply j: true.
For a replica set, whenj: true, the setup requires only the primary to write to the journal, regardless of the w: <value> write concern.
However, even if thej: true is configured for a replica set, rollbacks may occur due to replica set primary failover.
Unexpected Shutdown Data Recovery
All journal files in the journal directory get replayed whenever MongoDB restarts from a crash before the server is detected. Since this operation will be recorded in the log output, there will be no need to run --repair.
Changing the WiredTiger Journal Compressor
Snappy compressor is the default algorithm of compression for the journal. However one can change this depending on the mongod instance setup.
For a standalone mongod instance:
- Set the storage.wiredTiger.engineConfig.journalCompressor to a new value to update it. The most appropriate way to do this is through the config file but if using the command-line options, you must update the --wiredTigerJournalCompressor command-line option during restart.
- Shutdown the mongod instance by connecting to a mongo shell of the instance and issue the command: db.shutdownServer() or db.getSiblingDB(‘admin).shutdownServer()
- Restart the mongod instance:
- If using the configuration file, use: mongod -f <path to file.conf>
- If using command-line options, update the wiredTigerJournalCompressor:
Mongod --wiredTigerJournalCompressor <differentCompressor|none>
For a Replica Set Member:
- Shutdown the mongod instance: db.shutdownServer() or db.getSiblingDB(‘admin).shutdownServer()
- Make the following changes to the configuration file:
- Set storage.journal.enabled to false.
- Comment the replication settings
- Set parameter disableLogicalSessionCacheRefresh to true.
i.e
storage:
journal:
enabled: false
#replication:
# replSetName: replA
setParameter:
disableLogicalSessionCacheRefresh: true
Restart the mongod instance:
If using the configuration file, use: mongod -f <path to file.conf>
If using the command-line options: include the --nojournal option, remove any replication command-line options i.e --replSet and set parameter disableLogicalSessionCacheRefresh to true
mongod --nojournal --setParameter disableLogicalSessionCacheRefresh=true
Shutdown the mongod instance:
db.shutdownServer() or db.getSiblingDB(‘admin).shutdownServer()
Update the configuration file to prepare for a restart of the replica set member with the new journal compressor: Remove the storage.journal.enabled, uncomment the replication settings for the deployment, remove disableLogicalSessionCacheRefresh option and lastly remove storage.wiredTiger.engineConfig.journalCompressor.
storage:
wiredTiger:
engineConfig:
journalCompressor: <newValue>
replication:
replSetName: replA
Restart the mongod instance as a replica set member
- If using the configuration file, use: mongod -f <path to file.conf>
- If using the command-line options: remove --nojournal and --wiredTigerJournalCompressor options. Include the replication command-line options and remove the disableLogicalSessionCacheRefresh parameter.
mongod --wiredTigerJournalCompressor <differentCompressor|none> --replSet ...
Conclusion
In order for MongoDB to guarantee a write operation durability, journaling is used whereby data is written to on-disk through ahead logging. As much as the WiredTiger storage engine (which is the most preferred) can recover data through the last checkpoints, if MongoDB exits unexpectedly and journaling was not enabled, recovering such data becomes impossible. Otherwise, if journaling is enabled, MongoDB can re-apply the write operations on restart and maintain a consistent state.