Database security is a key factor to consider for any application that involves highly sensitive data such as financial and health reports. Data protection can be achieved through encryption at different levels starting from the application itself to files holding the data.
Since MongoDB is a non-relational database, one does not need to define columns before inserting data; and therefore documents in the same collection could have different fields from one another.
On the other hand, for SQL DBMS, one has to define columns for the data, hence all the rows have the same columns. One can decide to encrypt individual columns, entire database file or data from the application before being involved in the database process.
Encryption of individual columns is most preferred since it is cheaper and less data is encrypted thus improving on latency. In general, overall performance impacts as a result of the encryption.
For NoSQL DBMS, this approach however will not be the best. Considering that not all documents may have all the fields you want to use in your encryption, column-level encryption cannot be performed.
Encrypting data at application level is quite costly and difficult to implement. We therefore remain with an option of encrypting data at database level.
MongoDB provides native encryption which does not require one to pay an extra cost for securing your sensitive data.
Encrypting Data in MongoDB
Any database operation involves either of these two data forms, data at rest or data in motion.
Data in motion is a stream of data moving through any kind of network whereas data at rest is static hence not moving anywhere.
Both of these wo data types are prone to external interference by anonymous users not unless encryption is involved. The encryption process involves:
- Generating a master key for the entire database
- Generating unique keys for each database
- Encrypting your data with the database keys you generated
- Encrypting the entire database with the master key
Encrypting Data in Transit
Data is transacted between MongoDB and the server application in two ways that is, through Transport Layer Security (TLS) and Secure Socket Layer (SSL).
These two are the most used encryption protocols for securing sent and received data between two systems. Basically, the concept is to encrypt connections to the mongod and mongos instances such that the network traffic is only readable by the intended client.
TLS/SSL are used in MongoDB with some certificates as PEM files which are issued by the certificate authority or can be a self-signed certificate. The latter has a limitation in that however the communication channel is encrypted, there is always no validation against the server identity hence vulnerable to external attacks midway. It is thus advisable to use trusted authority certificates which permit MongoDB drivers to also verify the server identity.
Besides encryption, TLS/SSL can be used in the authentication of the client and internal auths of members of replica sets and sharded clusters through the certificates.
TLS/SSL Configuration for Clients
There are various TLS/SSL option settings that can be used in the configuration of these protocols.
For example, if you want to connect to a Mongod instance using encryption, you would start your instance like,
mongo --ssl --host example.com --sslCAFile /etc/ssl/ca.pem
--ssl enables the TLS/SSL connection.
--sslCAFile specifies the certificate authority (CA) pem file for verification of the certificate presented by the mongod or the mongos. The Mongo shell will therefore verify the certificate issued by the mongod instance against the specified CA file and the hostname.
You may also want to connect MongoDB instance that requires a client certificate. We use the code sample below
mongo --ssl --host hostname.example.com --sslPEMKeyFile /etc/ssl/client.pem --sslCAFile /etc/ssl/ca.pem
The option --sslPEMKeyFile specifies the .pem file that contains the mongo shell certificate and a key to present to the mongod or mongos instance. During the connection process:
The mongo shell will verify if the certificate is from the specified Certificate Authority that is the (--sslCAFile) and if not, the shell will fail to connect.
Secondly, the shell will also verify if the hostname specified in the --host option matches the SAN/CN in the certificate presented by the mongod or mongos. If this hostname does not match either of the two, then the connection will fail.
If you don’t want to use self-signed certificates, you must ensure the network of connection is trusted.
Besides, you need to reduce the exposure of private key, especially where replica sets/ sharded cluster is involved. This can be achieved by using different certificates on different servers.
Additional options that can be used in the connections are:
requireSSL: this will restrict each server to use only TLS/SSL encrypted connections.
--sslAllowConnectionsWithoutCertificates: This allows validation if only the client presents a certificate otherwise if there is no certificate, the client will still be connected in an encrypted mode. For example:
mongod --sslMode requireSSL --sslAllowConnectionsWithoutCertificates --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem
sslDisabledProtocols: this option prevents servers from accepting incoming connections that use specific protocols. This can be done with:
mongod --sslMode requireSSL --sslDisabledProtocols TLS1_0,TLS1_1 --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem
Encrypting Data at Rest
From version 3.2, MongoDB introduced a native encryption option for the WiredTiger storage engine. Access to data in this storage by a third party can only be achieved through a decryption key for decoding the data into a readable format.
The commonly used encryption cipher algorithm in MongoDB is the AES256-GCM. It uses the same secret key to encrypt and decrypt data. Encryption can is turned on using the FIPS mode thus ensuring the encryption meets the highest standard and compliance.
The whole database files are encrypted using the Transparent data encryption (TDE) at the storage level.
Whenever a file is encrypted, a unique private encryption key is generated and is good to understand how these keys are managed and stored. All the database keys generated are thereafter encrypted with a master key.
The difference between the database keys and the master key is that the database keys can be stored alongside the encrypted data itself but for the master key, MongoDB advises it to be stored in a different server from the encrypted data such as third-party enterprise key management solutions.
With replicated data, the encryption criteria are not shared to the other nodes since the data is not natively encrypted over the wire. One can reuse the same key for the nodes but the best practice is to use unique individual keys for every node.
Rotating Encryption Keys
Managed key used for decrypting sensitive data should be rotated or replaced at least once a year. There are two options in MongoDB for achieving the rotation.
KMIP Master Rotation
In this case, only the master key is changed since it is externally managed. The process for rotating the key is as described below.
The master key for the secondary members in the replica set is rotated one at a time. I.e
mongod --enableEncryption --kmipRotateMasterKey \ --kmipServerName <KMIP Server HostName> \--kmipServerCAFile ca.pem --kmipClientCertificateFile client.pem
After the process is completed, mongod will exit and you will need to restart the secondary without the kmipRotateMasterKey parameter
mongod --enableEncryption --kmipServerName <KMIP Server HostName> \ --kmipServerCAFile ca.pem --kmipClientCertificateFile client.pem
The replica set primary is stepped down:
Using the rs.stepDown()method, the primary is deactivated hence forcing an election of a new primary.Check the status of the nodes using the rs.status() method and if the primary indicates to have been stepped down the rotate its master key. Restart the stepped down member including the kmipRotateMasterKey option.
mongod --enableEncryption --kmipRotateMasterKey \ --kmipServerName <KMIP Server HostName> \ --kmipServerCAFile ca.pem --kmipClientCertificateFile client.pem
Logging
MongoDB always works with a log file for recording some status or specified information at different intervals.
However, the log file is not encrypted as part of the storage engine. This poses a risk in that a mongod instance running with logging may output potentially sensitive data to the log files just as part of the normal logging.
From the MongoDB version 3.4, there is the security.redactClientLogData setting which prevents potentially sensitive data from being logged in the mongod process log. However, this option can complicate the log diagnostics.
Encryption Performance in MongoDB
Encryption at some point results in increased latency hence degrading the performance of a database. This is usually the case when a large volume of documents is involved.
Encrypting and decrypting require more resources hence is important to understand this relationship in order to adjust capacity planning accordingly.
Regarding the MongoDB tests, an encrypted storage engine will experience a latency of between 10% to 20% at the highest load. This often the case when a user writes a large amount of data to the database hence resulting in reduced performance. For read operations, the performance degradation is negligible, about 5 - 10%.
For a better encryption practice in MongoDB, the WiredTiger storage engine is most preferred due to its high performance, security, and scalability. Further, it optimizes encryption of database files to page level which has great merit in that, if a user reads or writes data to the encrypted database, the throughput operation will only be applied to the page on which the data is stored rather than the entire database.
This will reduce the amount of data that will need to be encrypted and decrypted for processing a single piece of data.
Summary
Data security for sensitive information is a must and there is need to protect it without degrading the performance of the database system.
MongoDB provides a robust native encryption procedures that can help us secure our data both one at rest and that in motion. Besides, the encryption procedures should comply with the set standards by different organizations.
The advanced WiredTiger storage engine provides a better option due to its associated merits such as high performance, scalability, and security. When encrypting data in replica sets, it is a good practice to use different master keys for each besides changing them at least once a year.
However the availability of third-party encryption options, there is no guarantee that your deployment will match alongside them in terms of scalability. It is thus quite considerate to employ database level encryption.