Deploying a clustered database is one thing, but how you maintain your DBM while in cluster can be a large undertaking for a consistent serving of your applications. One should have an often update on the status of the database more especially the most crucial metrics in order to get a clue of what to upgrade or rather alter as a way of preventing any bottlenecks that may emerge.
There are a lot of considerations regarding MongoDB one should take into account especially the fact that it’s installation and running are quite easy chances of neglecting basic database management practices are high.
Many at times, developers fail to put into account future growth and increased usage of the database which consequently results in crashing of application or data with some integrity issues besides being inconsistent.
In this article we are going to discuss some of the best practices one should employ for MongoDB cluster for an efficient performance of your applications. Some of the factors one should consider include...
- Upgrading to latest version
- Appropriate storage engine
- Hardware resources allocation
- Replication and sharding
- Never change server configuration file
- Good Security Strategy
Upgrading to Latest Version
I have worked with MongoDB from versions before 3.2 and to be honest, things were not easy at that time. With great developments, fixed bugs and newly introduced features, I will advise you to always upgrade your database to the latest version. For instance, the introduction of the aggregation framework had a better performance impact rather than relying on the Map-Reduce concept that was already in existence. With the latest version 4.0, one has now the capability to utilize the multi document transactions feature which generally improves on throughput operations. The latest version also has some additional new type conversion operators such as $toInt, $toString, $trim and $toBool. This operators will greatly help in the validation of data hence create some sense of data consistency. When upgrading please refer to the docs so that you may avoid making slight mistakes that may escalate to be erroneous.
Choose an Appropriate Storage Engine
MongoDB supports 3 storage engines as per now that is: WiredTiger, In-Memory and MMAPv1 storage engines. Each of these storage engines has got merits and limitations over the other but your choice will depend on your application specification and the core functionality of the engine. However, I personally prefer the WiredTiger storage engine and I would recommend this for one who is not sure which one to use. The WiredTiger storage engine is well suited for most workloads, provides a document-level concurrency model, checkpointing and compression.
Some of the considerations regarding selections of storage engine are dependent on this aspects:
- Transactions and atomicity: provision of data during an insert or update which is committed only when all conditions and stages in application have been executed successfully. Operations are therefore bundled together in an immutable unit. With this in place multi-document transaction can be supported as seen in the latest version of MongoDB for the WiredTiger storage engine.
- Locking type: it is a control strategy on access or update of information. During the lock duration no other operation can change the data of selected object until the current operation has been executed. Consequently, queries get affected at this time hence important to monitor them and reduce the bulk of locking mechanism by ensuring you select the most appropriate storage engine for your data.
- Indexing: Storage engines in MongoDB provide different indexing strategies depending on the data types you are storing. Efficiency of that data structure should be quite friendly with your workload and one can determine this by considering every extra index as having some performance overhead. Write optimized data structures have lower overhead for every index in a high-insert application environment than non-write optimized data structures. This will be a major setback especially where a large number of indexes is involved and selection of an inappropriate storage engine. Therefore, choosing an appropriate storage engine can have a dramatic impact.
Hardware Resources Allocation
As new users sign into your application, the database grows with time and new shards will be introduced. However, you cannot rely on the hardware resources you had established during the deployment stage. There will be a correspondent increase on the workload and hence require more processing resources provision such as CPU and RAM to support your large data clusters. This is often referred to capacity planning in MongoDB. The best practices around capacity planning include:
- Monitor your database constantly and adjust in accordance to expectations. As mentioned before, an increase in number of users will trigger more queries henceforth with an increased workload set especially if you employ indexes. You may start experiencing this impacts on the application end when it starts recording a change in the percentage of writes versus reads with time. You will therefore need to re-configure your hardware configurations in order to address this issue. Use mongoperf and MMS tool to detect changes in system performance parameters.
- Document all you performance requirement upfronts. When you encounter same problem you will at least have a point of reference which will save you some time. Your recording should involve size of data you want to store, analysis of queries in terms of latency and how much data you would like to access at a given time. In production environment you need to determine how many requests are you going to handle per second and lastly how much latency you are going to tolerate.
- Stage a Proof of Concept. Performa schema/index design and comprehend the query patterns and then refine your estimate of the working set size. Record this configuration as a point of reference for testing with successive revisions of the application.
- Do your tests with real workload. After carrying out stage of proof concept, deploy only after carrying a substantial testing with real world data and performance requirements.
Replication and Sharding
These are the two major concepts of ensuring High Availability of data and increased horizontal scalability respectively in MongoDB cluster.
Sharding basically partitions data across servers into small portions known as shards. Balancing of data across shards is automatic, shards can be added or removed without necessarily taking the database offline.
Replication on the other end maintains a multiple redundant copies of the data for high availability. It is an in-built feature in MongoDB and works across a wide area networks without the need for specialized networks. For a cluster setup, i recommend you have at least have 2+ mongos, 3 config servers, 1 shard an ensure connectivity between machines involved in the sharded cluster. Use a DNS name rather than IPs in the configuration.
For production environments use a replica set with at least 3 members and remember to populate more configuration variables like oplog size.
When starting your mongod instances for your members use the same keyfile.
Some of the considerations of your shardkey should include:
- Key and value are immutable
- Always consider using indexes in a sharded collection
- Update driver command should contain a shard key
- Unique constraints to be maintained by the shard key.
- A shard key cannot contain special index types and must not exceed 512 bytes.
Never Change Server Configuration File
After doing your first deployment, it is advisable not to change a lot of parameters in the configuration file otherwise you may land into trouble especially with shards. The weakest link with sharding is the config servers. This is to say all the mongod instances have to be running in order for sharding to work.
Good Security Strategy
MongoDB has been vulnerable to external attacks in the past years hence an important undertaking for your database to have some security protocols. Besides running the processes in different ports, one should at least employ one of the 5 different ways of securing MongoDB databases. You can consider platforms such as MongoDB Atlas which secure databases by default through encryption of the data both in-transit and at-rest. You can use strategies like TLS/SSL for all incoming and outgoing connections.
Conclusion
MongoDB cluster control is not an easy task and it involves a lot of workarounds. Databases grow as a result of more users hence increased workload set. On has therefore a mandate to ensure the performance of the DBM is in line with this increased number of users. The best practices go beyond increasing hardware resources and applying some MongoDB concepts such as sharding, replication and indexing. However, many of the inconveniences that may arise are well addressed by upgrading your MongoDB version. More often the latest versions have bugs fixed, new feature requests integrated and almost no negative impact to upgrading even with major revision numbers.