Quantcast
Channel: Severalnines - MongoDB
Viewing all 286 articles
Browse latest View live

Automating and Managing MongoDB in the Cloud

$
0
0

Database management has traditionally been complex and time-consuming. Deployment, with the headaches of security, complex networking, backup planning and implementation, and monitoring, has been a headache. Scaling out your database cluster has been a major undertaking. And in a world where 24/7 availability and rapid disaster recovery is expected, managing even a single database cluster can be a full-time job.

Severalnines’ ClusterControl is a database deployment and management system that addresses the above, facilitating rapid deployment of redundant, secure database clusters or nodes, including advanced backup and monitoring functionality - whether on premise or in the cloud. With plugins supporting Nagios, PagerDuty, and Zabbix, among others, ClusterControl integrates well with existing infrastructure and tools to help you manage your database servers with confidence.

MongoDB is the leading NoSQL database server in the world today. Using ClusterControl, with which you can deploy and manage either official MongoDB or Percona Server for MongoDB, Percona’s competing offering incorporating MongoDB Enterprise features, we are going to walk through deploying a MongoDB Replica Set with three data nodes, and look at some of the features of the ClusterControl application.

We’re going to run through some key features of ClusterControl, especially as they pertain to MongoDB, using Amazon Web Services. Amazon Web Services (or AWS) is the largest Infrastructure as a Service cloud provider globally, hosting millions of users all over the world.It comprises many services for all use cases from virtually unlimited object storage with S3 and highly scalable virtual machine infrastructure using EC2 all the way to enterprise database warehousing with Redshift and even Machine Learning.

Once you’ve read this blog, you may also wish to read our DIY Cloud Database on Amazon Web Services Whitepaper, which discusses configuration and performance considerations for database servers in the AWS Cloud in more detail. In addition, we have Become a MongoDB DBA, a whitepaper with more in depth MongoDB-specific detail.

To begin, first you will need to deploy four AWS instances. For a production platform, the instance type should be carefully chosen based on the guidelines we have previously discussed, but for our purposes instances with 2 virtual CPUs and 4GB RAM will be sufficient. One of these nodes will host ClusterControl, the others will be used to deploy the three database nodes.

Begin by creating your database nodes’ security group, allowing inbound traffic on port 27017. There is no need to restrict outbound traffic, but should you wish to do so, allow outbound traffic on ports 1024-65535 to facilitate outbound communication from the database servers.

Next create the security group for your ClusterControl node. Allow inbound traffic on ports 22, and 80. Add this security group ID to your database nodes security group, and allow unrestricted TCP communication. This will facilitate communication between the two security groups, without allowing ssh access to the database nodes from external clients.

Launch the instances into their respective security groups, choosing for each instance a KeyPair for which you have the ssh key. For the purposes of this task, use the same KeyPair for all instances. If you have lost the ssh key for your KeyPair, you will have to create a new KeyPair. When launching the instances, do not choose the default Amazon Linux image, instead choose an AMI based on a supported operating system listed here. As I am using AWS region EU-CENTRAL-1, I will use community AMI ami-fa2df395, a CentOS 7.3 image, for this purpose.

If you have the AWS command line tools installed, use the aws ec2 describe-instances command detailed previously to confirm that your instances are running--otherwise view your instances in the AWS web console--and when confirmed, log in to the ClusterControl instance via ssh.

Copy the public key file you downloaded when creating your KeyPair to the ClusterControl instance. You can use the scp command for this purpose. For now, let’s leave it in the default /home/centos directory, the home directory of the centos user. I have called mine s9s.pem. You will need the wget tool installed; install it using the following command:

$ sudo yum -y install wget

To install ClusterControl, run the following commands:

$ wget http://www.severalnines.com/downloads/cmon/install-cc
$ chmod +x install-cc
$ ./install-cc # as root or sudo user

The installation will walk you through some initial questions, after which it will take a few minutes to retrieve and install dependencies using your operating system’s package manager.

When installation is complete, point your web browser to http://<address of your ClusterControl instance>. You can find the external facing address of the instance using the describe-instances command, or via the AWS web console.

Once you have successfully logged in, you will see the following screen, and can continue to deploy your MongoDB Replica Set.

Figure 1: Welcome to ClusterControl!

As you can see, ClusterControl can also import existing database clusters, allowing it to manage your existing infrastructure as easily as new deployments.

For our purposes, you are going to click Deploy Database Cluster. On the next screen you will see the selection of database servers and cluster types that ClusterControl supports. Click the tab labelled MongoDB ReplicaSet. Here the values with which you are concerned are SSH User, SSH Key Path, and Cluster Name. The port should already be 22, the default ssh port, and the AMI we are using does not require a Sudo Password.

Figure 2: Deploying a MongoDB Replica Set

The ssh user for the CentOS 7 AMI is centos, and the SSH Key Path is /home/centos/s9s.pem, or the appropriate path depending on your own Key file name. Let’s use MongoDB-RS0 as the Cluster Name. Accepting the default options, we click Continue.

Figure 3: Configuring your deployment

Here we can choose between the MongoDB official build, and a Percona build. Select whichever you prefer, and supply an admin user and password with which to configure MongoDB securely. Note that ClusterControl will not let you proceed unless you provide these details. Make a note of the credentials you have supplied, you will need them to log in to the deployed MongoDB database, if you wish to later use it. Now choose a Replica Set name, or accept the default. We are going to use the vendor repositories, but be aware that you can configure ClusterControl to use your own repositories or those of a third party, if you prefer.

Add your database nodes, one at a time. You can choose to use the external IP address, but if you provide the hostname, which is generally recommended, ClusterControl will record all network interfaces in the hosts, and you will be able to choose the interface on which you would like to deploy. Once you have added your three database nodes, click Deploy. ClusterControl will now deploy your MongoDB Replica Set. Click Full Job Details to observe as it carries out the configuration of your cluster. When the job is complete, go to the Database Clusters screen and see your cluster.

Figure 4: Auto Recovery

Taking a closer look, you can see that Auto Recovery is enabled at both a cluster and a node level; in the case of failures, ClusterControl will attempt to recover your cluster or the individual node having an issue. The green tick beside each node also displays the cluster’s health status at a glance.

Figure 5: Scheduling Backups

The last feature we will cover here is Backups. ClusterControl provides a backup feature that allows a full cluster consistent backup, or simply a standard mongodump backup if you prefer. It also provides the facility to create scheduled backups to run periodically to a schedule of your choosing. Backup retention is also handled, with the option to retain backups for a limited period, avoiding storage issues.

In this blog I’ve attempted to give you a brief overview of using ClusterControl with MongoDB, but there are many more features supported by ClusterControl. Deployment of Sharded Clusters, with hidden and/or delayed slaves, arbiters and other features are all available. More information is available on our website, where you can also find webinars, whitepapers, tutorials, and training, and try out ClusterControl free.


View the replay: how to manage MongoDB & Percona Server for MongoDB

$
0
0

Many thanks to everyone who participated in this week’s webinar on how to manage MongoDB and Percona Server for MongoDB! The replay is now available for viewing online.

For this webinar we’d teamed up with Percona’s Tyler Duzan, Product Manager, who talked to us about some of the key features and aspects of Percona Server for MongoDB.

And our colleague Ruairi Newman compared the MongoDB-relevant functionality of MongoDB’s Ops Manager and ClusterControl. Participants learned about the differences between these systems, and how they help automate and manage MongoDB operations.

View the replay to find out more

Percona Server for MongoDB is a fully compatible, open source, drop-in replacement for the MongoDB®Community Server and provides MongoDB© Enterprise Edition features at no licensing cost, with additional storage engines and performance improvements.

ClusterControl is the all-inclusive management system for open source databases.

With Percona Server for MongoDB and Severalnines ClusterControl together, users benefit from à cost-efficient solution with additional features and capabilities.

View the replay to learn more

Agenda

  • Introduction to Percona Server for MongoDB
  • How to automate and manage MongoDB
    • Installation and maintenance
    • Complexity of architecture
    • Options for redundancy
    • Comparative functionality
    • Monitoring, Dashboard, Alerting
    • Backing up your deployments
    • Automated deployment of advanced configurations
    • Upgrading existing deployments

Speakers

Ruairí Newman is passionate about all things cloud and automation and has worked for MongoDB, VMware and Amazon Web Services among others. He has a background in Operational Support Systems and Professional Services.

Prior to joining Severalnines, Ruairí worked for Huawei Ireland as Senior Cloud Solutions Architect on their Web Services project, where he advised on commodity cloud architecture and Monitoring technologies, and deployed and administered a Research & Development Openstack lab.

Prior to joining Percona as a Product Manager, Tyler Duzan spent almost 13 years as an operations and security engineer in a variety of different industries. Deciding to take his analytical mindset and strategic focus into new territory, Tyler is applying his knowledge to solving business problems for Percona customers with inventive solutions combining technology and services.

New NinesControl Features Simplify Running MySQL & MongoDB in the Cloud

$
0
0

We were thrilled to announce the latest version of NinesControl early this week. Let’s have a closer look at some of the new features that were introduced.

Web console for accessing SSH, SQL and MongoDB command line

It is now possible to have SSH access from the NinesControl interface to any of your database servers, via a web based SSH proxy.

WebSSH and WebSQL - you can easily access your database’s shell and MySQL CLI directly from the NinesControl web page:

Just check the drop down menu for any of the database nodes, and you’ll see both WebSSH and WebSQL. If you are running MongoDB, you will have a MongoDB shell. Clicking on them opens a new window, with a connection to your database. It can be either access to the command line:

NinesControl WebSSH interface
NinesControl WebSSH interface

or access to the MySQL CLI:

NinesControl SQL CLI
NinesControl SQL CLI

Or access to the MongoDB CLI:

NinesControl MongoDB shell
NinesControl MongoDB shell

We use dedicated users to make such access possible - for the shell, it is ‘ninescontrol’ user and for the MySQL CLI - ‘ninescontroldb@localhost’. For MongoDB, it is the admin user you define when creating the MongoDB cluster. Make sure you don’t make changes to those users if you want to have this method of access available.

Add node to the cluster

With the new release of NinesControl, you have the tool to scale your cluster. If you ever find yourself in a position where you need one more database node to handle the load, you can easily add it through the “Add Node” action:

Adding a node to a cluster in NinesControl
Adding a node to a cluster in NinesControl

You will be presented with a screen where you need to pick the size of the new node:

Adding a node to a cluster in NinesControl
Adding a node to a cluster in NinesControl

After you click “Add Node”, the deployment process begins:

Adding a node to a cluster in NinesControl
Adding a node to a cluster in NinesControl

After a while node should be up and running.

Nodes running in NinesControl
Nodes running in NinesControl

Of course, whenever you feel like you don’t utilize all of your nodes, you can remove some of them:

Removing a node in ClusterControl
Removing a node in ClusterControl

While scaling your cluster up and down, please keep in mind that it is recommended to have odd number of nodes in both Galera and MongoDB clusters. You also should not reduce the number of nodes below three - this is a requirement if you want your cluster to be fault-tolerant.

Disable autorecovery for the cluster

NinesControl works in the background to make sure your cluster is up and running and your application can reach it and issue queries. Failed nodes are automatically recovered and restarted. Still, it may happen that you don’t want NinesControl to bring a node back up. It could be that you are performing some maintenance which requires the database instance to stay down. Maybe you want to restore an external binary backup and then bootstrap the rest of the cluster from that node? Right now it is extremely easy to disable automated recovery - all you need to do is to click on the Autorecovery switch in the UI:

Disabling Auto Recovery in NinesControl
Disabling Auto Recovery in NinesControl

It will change to:

Auto Recovery Disabled in NinesControl
Auto Recovery Disabled in NinesControl

Right now NinesControl will not attempt to restore nodes which failed.

Google Compute Engine Support

Last but definitely not least, NinesControl now also supports Google Compute Engine. You can learn more on how to setup access credentials and deploy MySQL or MongoDB on this new cloud provider.

We hope that this blog post helped you better understand the new features available in NinesControl. Please give them a try and let us know what you think.

MongoDB Webinar - How to Secure MongoDB with ClusterControl

$
0
0

Join us for our new webinar on “How to secure MongoDB with ClusterControl” on Tuesday, March 14th!

In this webinar we will walk you through the essential steps necessary to secure MongoDB and how to verify if your MongoDB instance is safe.

How to secure MongoDB with ClusterControl

The recent MongoDB ransom hijack caused a lot of damage and outages, which could have been prevented with maybe two or three simple configuration changes. MongoDB offers a lot of security features out of the box, however it disables them by default.

In this webinar, we will explain which configuration changes are necessary to enable MongoDB’s security features, and how to test if your setup is secure after enabling. We will demonstrate how ClusterControl enables security on default installations. And we will discuss how to leverage the ClusterControl advisors and the MongoDB Audit Log to constantly scan your environment, and harden your security even more.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, March 14th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, March 14th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)

Register Now

Agenda

  • What is the MongoDB ransom hack?
  • What other security threats are valid for MongoDB?
  • How to enable authentication / authorisation
  • How to secure MongoDB from ransomware
  • How to scan your system
  • ClusterControl MongoDB security advisors
  • Live Demo

Speaker

Art van Scheppingen is a Senior Support Engineer at Severalnines. He’s a pragmatic database expert with over 16 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to MongoDB, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, MongoDB Open House, FOSDEM) and related meetups.

We look forward to “seeing” you there!

This session is based upon the experience we have securing MongoDB and implementing it for our database infrastructure management solution, ClusterControl. For more details, read through our ‘Become a MongoDB DBA’ blog series.

Free Open Source Database Deployment & Monitoring with ClusterControl Community Edition

$
0
0

The ClusterControl Community Edition is a free-to-use, all-in-one database management system that allows you to easily deploy and monitor the top open source database technologies like MySQL, MariaDB, Percona, MongoDB, PostgreSQL, Galera Cluster and more. It also allows you to import and monitor your existing database stack.

Free Database Deployment

The ClusterControl Community Edition ensures your team can easily and securely deploy production-ready open source database stacks that are built using battle-tested, proven methodologies. You don’t have to be a database expert to utilize the ClusterControl Community Edition - deploying the most popular open sources databases is easy with our point-and-click interface. Even if you are a master of deploying databases, ClusterControl’s point-and-click deployments will save you time and ensure your databases are deployed correctly, removing the chance for human error. There is also a CLI for those who prefer the command line, or need to integrate with automation scripts.

The ClusterControl Community Edition is not restricted to a single database technology and supports the major flavors and versions. With it you’re able to apply point-and-click deployments of MySQL standalone, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Cluster, Percona XtraDB and Percona Server for MongoDB, MongoDB itself and PostgreSQL!

Free Database Monitoring

The ClusterControl Community Edition makes monitoring easy by providing you the ability to look at all your database instances across multiple data centers or drill into individual nodes and queries to pinpoint issues. Offering a high-level, multi-dc view as well as a deep-dive view, ClusterControl lets you keep track of your databases so you can keep them running at peak performance.

In addition to monitoring the overall stack and node performance you can also monitor the specific queries to identify potential errors that could affect performance and uptime.

Why pay for a monitoring tool when the ClusterControl Community Edition gives you a great one for free!

Free Database Developer Studio

The Developer Studio provides you a set of monitoring and performance advisors to use and lets you create custom advisors to add security and stability to your database infrastructures. It lets you extend the functionality of ClusterControl, which helps you detect and solve unique problems in your environments.

We even encourage our users to share the advisors they have created on GitHub by adding a fork to our current advisor bundle. If we like them and think that they might be good for other users we’ll include them in future ClusterControl releases.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Why Should I Use the ClusterControl Community Edition?

These are just a few of the reasons why you should use ClusterControl as your system to deploy and monitor your open source database environments…

  • You can deploy knowing you are using proven methodologies and industry best practices.
  • If you are just getting started with open source database technology ClusterControl makes it easy for the beginner to deploy and monitor your stacks removing human error and saving you time.
  • If you are not familiar with orchestration programs like Puppet and Chef? Don’t worry! The ClusterControl Community Edition uses a point-and-click GUI to make it easy to get your environment production-ready.
  • The ClusterControl Community Edition gives you deployment and monitoring in one battle-tested all-in-one system. Why use one tool for scripting only to use a different tool for monitoring?
  • If you are not sure what database technology is right for your application? The ClusterControl Community Edition supports nearly two dozen database versions that you can try.
  • Have a load balancer running on an existing stack? With the ClusterControl Community Edition you can import and deploy your existing and already configured load balancer to run alongside your database instances.

If you are ready to give it a try click here to download and install the latest version of ClusterControl. Each install comes with the option to activate a 30-day enterprise trial as well.

Upgrading to the ClusterControl Enterprise Edition

$
0
0

The ClusterControl Enterprise Edition provides you will a full suite of management and scaling features in addition to the deployment and monitoring functions offered as part of the free Community Edition. You also have the ability to deploy, configure and manage the top open source load balancing and caching technologies to drive peak performance for your mission-critical applications.

Whether you have been benefiting from the free resources included in the Community Edition or have evaluated the product through the Enterprise Trial, we’ll walk you through how our licensing works and explain how to get you up-and-running with all the automation and scaling that ClusterControl Enterprise has to offer.

“With quick installation, ease of use, great support, stable deployments and a scalable architecture, ClusterControl is just the solution we were looking for to provide a strong MySQL HA platform to our customers.”

Xavi Morrus, CMO, MediaCloud

How to Upgrade from Community to Enterprise

While using the ClusterControl Community Edition you may have clicked on a feature and got a pop-up indicating that it was not included in the version you are using. When this happens you have two options. You can activate (or extend) your Enterprise Trial OR you can contact sales to purchase an enterprise license.

“Our back-end is reliant on different databases to tackle different tasks. Using several different tools, rather than a one-stop shop, was detrimental to our productivity. Severalnines is that ‘shop’ and we haven’t looked back. ClusterControl is an awesome solution like no other.”

Zeger Knops, Head of Business Technology, vidaXL

Enterprise Trial

The ClusterControl Enterprise trial provides you with free access to our full suite of features for 30 days. The purpose of this trial is to allow you to “kick the tires” using your environments and applications to make sure that ClusterControl meets your needs.

With the trial you have access to all our Community features plus: Custom Dashboards, Load Balancers, Configuration Management, Backup and Restore, Automatic Node and Cluster Recovery, Role Based Access Control, Key Management, LDAP, SSL Encryption Scaling, and more!

The trial also grants you Enterprise Level access to our support teams 24/7. We want to make sure that you have the best experience during your trial and also introduce you to our amazing support that you can count on when you become a customer of Severalnines.

At the end of your trial, you will have the option to meet with our sales team to continue with ClusterControl Enterprise on a paid license. Or you may also continue with our ClusterControl Community Edition, which you can use for free - forever.

Extending Your Trial

Sometimes thirty days isn’t enough time to evaluate a product as extensive as ClusterControl. In these situations we can sometimes grant an extension to allow you some more time to evaluate the product. This extension can be requested from the product itself and you will be contacted by an account manager to arrange for the extension.

“ClusterControl is phenomenal software…I’m usually not impressed with vendors or the software we buy, because usually it’s over promised and under delivered. ClusterControl is a nice handy system that makes
me feel confident that we can run this in a production environment.”

Jordan Marshall, Manager of Database Administration, Black Hills Corporation

Purchasing a Commercial License

ClusterControl offers three separate plans and different support options. Our account managers are available to assist, and recommend the best plan. We also offer volume discounts for larger orders. In short, we will work very hard to make sure our price meets your needs and budget. Once we’ve all signed on the dotted line, you will then be provided with Commercial License keys that you can put into your already deployed environment (or into a new one) which will then immediately grant you full access to the entire suite of ClusterControl features that you have contracted.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Benefits of Upgrading

While the free ClusterControl community version provides rich features that allow you to easily and securely deploy and monitor your open source databases, the Enterprise Edition provides much much more!

These are just some of the features awaiting you in the Enterprise Edition...

  • Advanced Backup & Restoration: With ClusterControl you can schedule logical or physical backups with failover handling and easily restore backups to bootstrap nodes or systems.
  • Automated Failover: ClusterControl includes advanced support for failure detection and handling; it also allows you to deploy different proxies to integrate them with your HA stack.
  • Topology Changes: Making topology changes with ClusterControl is easy; it does all the background work to elect a new master, deploy fail-over slave servers, rebuild slaves in case of data corruption, and maintain load balancer configurations to reflect all the changes.
  • Load Balancing: Load balancers are an essential component in database high availability; especially when making topology changes transparent to applications and implementing read-write split functionality and ClusterControl provides support for ProxySQL, HAProxy, and Maxscale.
  • Advanced Security: ClusterControl removes human error and provides access to a suite of security features automatically protecting your databases from hacks and other threats. Operational Reports come in handy, whether you need to show you are meeting your SLAs or wish to keep track of the historical data of your cluster.
  • Scaling: Easily add and remove nodes, resize instances, and clone your production clusters with ClusterControl.

In short, ClusterControl is an all-inclusive database management system that removes the need for your team to have to cobble together multiple tools, saving you time and money.

If you ever have any issues during this process you can always consult the documentation or contact us. If you need support you can contact us here.

Deploying MySQL, MariaDB, Percona Server, MongoDB or PostgreSQL - Made Easy with ClusterControl

$
0
0

Helping users securely automate and manage their open source databases has been at the core of our efforts from the inception of Severalnines.

And ever since the first release of our flagship product, ClusterControl, it’s always been about making it as easy and secure as possible for users to deploy complex, open source database cluster technologies in any environment.

Since our first steps with deployment, automation and management we’ve perfected the art of securely deploying highly available open source database infrastructures by developing ClusterControl from a deployment and monitoring tool to a full-blown automation and management system adopted by thousands of users worldwide.

As a result, ClusterControl can be used today to deploy, monitor, and manage over a dozen versions of the most popular open source database technologies - on premise or in the cloud.

Whether you’re looking to deploy MySQL standalone, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Cluster, Percona XtraDB and Percona Server for MongoDB, MongoDB itself and PostgreSQL - ClusterControl has you covered.

In addition to the database stores, users can also deploy and manage load balancing technologies such as HAProxy, ProxySQL, MaxScale and Keepalived.

“Very easy to deploy a cluster, also it facilitates administration and monitoring.”

Michel Berger IT Applications Manager European Broadcasting Union (EBU)

Using ClusterControl, database clusters can be either deployed new or existing ones imported.

A deployment wizard makes it easy and secure to deploy production-ready database clusters with a point and click interface that walks the users through the deployment process step by step.

Select Deploy or Import Cluster

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Walk through of the Deploy Wizard

View your cluster list

“ClusterControl is great for deploying and managing a high availability infrastructure. Also find the interface very easy to manage.”

Paul Masterson, Infrastructure Architect, Dunnes

Deploying with the ClusterControl CLI

Users can also chose to work with our CLI, which allows for easy integration with infrastructure orchestration tools such as Ansible etc.

s9s cluster
  --create
  --cluster-type=galera
  --nodes='10.10.10.26;10.10.10.27;10.10.10.28'
  --vendor=percona
  --cluster-name=PXC_CENTOS7
  --provider-version=5.7
  --os-user=vagrant   --wait

The ClusterControl deployment supports multiple NICS and templated configurations.

In short, ClusterControl provides:

  • Topology-aware deployment jobs for MySQL, MariaDB,Percona, MongoDB and PostgreSQL
  • Self-service and on-demand
  • From standalone nodes to load-balanced clusters
  • Your choice of barebone servers, private/public cloud and containers

To see for yourself, download ClusterControl today and give us your feedback.

ClusterControl - All the Feature Highlights & Improvements from 2017

$
0
0

With four major releases in 2017 ClusterControl is better than ever at supporting your MySQL, MariaDB, MongoDB& PostgreSQL environments.

When thinking about the features and functions released in 2017 three main themes emerge…

Delivering High Availability

2017 meant the introduction of ProxySQL, a lightweight yet complex protocol-aware proxy that sits between the MySQL clients and server. It also meant improved support for HAProxy and Keepalived and making sure that MySQL and MariaDB can fully utilize them.

Making You More Efficient

From the introduction of the new ClusterControl CLI to dozens of improvements to our UI to the new system to integration with alarms and chatops, ClusterControl now makes it even easier to manage your database environments.

Mixed Environment Support

ClusterControl has always been the system to manage multiple technologies from a single console and have them work together seamlessly. 2017 meant adding support for the latest versions of MariaDB, MongoDB, MySQL, PostgreSQL, Percona Server, and Galera Cluster.

ClusterControl 1.4.0 - January 2017

Announced in January 2017, ClusterControl version 1.4.0 brought several improvements for MySQL Replication and MongoDB. It was also the first version to introduce features for ProxySQL.

With the new version you are now able to deploy a multi-master replication setup in active - standby mode. One master will actively take writes, while the other one is ready to take over writes should the active master fail. From the UI, you can also easily add slaves under each master and reconfigure the topology by promoting new masters and failing over slaves.

Topology reconfigurations and master failovers are not always possible in case of replication problems, for instance errant transactions. In this version ClusterControl checks for issues before any failover or switchover happens. The admin can define whitelists and blacklists of which slaves to promote to master (and vice versa). This makes it easier for admins to customize failover automation in their replication setups.

For MongoDB we extended support, bringing sharded clusters in addition to replica sets. Coupled with this is the ability to retrieve more metrics for monitoring, adding new advisors and providing consistent backups for sharding. With this release, you could now convert a ReplicaSet cluster to a sharded cluster, add or remove shards from a sharded cluster as well as add Mongos/routers to a sharded cluster.

Lastly, we added our initial support for ProxySQL allowing for its deployment onto MySQL Replication setups.

ClusterControl 1.4.1 - April 2017

April was ProxySQL month at Severalnines. ClusterControl 1.4.1 focused almost exclusively on adding additional features and support for this exciting new load balancing technology.

In this version you could now easily configure and manage your ProxySQL deployments with a comprehensive UI. You could create servers, reorientate your setup, create users, set rules, manage query routing, and enable variable configurations. It was now possible to view query analytics for all queries going through the proxy, and e.g. cache any frequent queries in just a click.

ClusterControl 1.4.2 - June 2017

Coined “The DevOps Edition”, version 1.4.2 brought improved support and new features like automated failover for PostgreSQL& MongoDB and included even more features for ProxySQL.

One of the main highlights in this release is the ClusterControl CLI, which allows users who prefer to manage their databases through the command line. All actions, such as deploying a cluster, using the CLI will be visible in the UI and vice versa.

Also included in this release is the new integration system for alarm notification and chatops systems. This new integration with popular incident management and chat services lets you customise the alarms and get alerted in the ops tools you are already using - e.g., Pagerduty, VictorOps, Telegram, Opsgenie and Slack.

ClusterControl 1.5.0 - November 2017

ClusterControl 1.5 provided an array of exciting new backup functionalities to ensure that your data is secure and available whenever disaster strikes. The release also provides expanded PostgreSQL, MariaDB, MySQL NDB Cluster, and ProxySQL support.

This version introduced a new Backup Wizard with new support for AWS & Google Cloud backups, backup verification, Single Database backups and restores, and the ability to create and restore slaves from a backup rather than doing it from the master. Automatic restore testing was an awaited feature, as it is a time consuming task that is often neglected by database administrators.

PostgreSQL got a number of new features in this version including version 10 support, load balancing and virtual IP support with HAProxy and Keepalived, a new backup method, and support for synchronous replication failover.

The version also included support for MariaDB 10.2 and MySQL NDB Cluster 7.5.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

If any of these features appeal to you make sure to upgrade or download the latest version of ClusterControl to take advantage of them.

We look forward to providing you even more features to help you deploy, monitor, manage and scale your open source databases further in 2018!


Open Source Databases in 2017 and Trends for 2018

$
0
0

With 2017 quickly coming to a close and 2018 looming on the horizon we wanted to take a minute and reflect on what’s been happening in our industry in the past year and what we are excited about for the future.

Johan Andersson, Severalnines CTO and Co-Founder took a few minutes from working on the newest releases of ClusterControl to talk with us about his thoughts on 2017.

2018 Database Trends and Predictions

As technology moves fast the open source world moves even faster. Here are some predictions from around the web for 2018…

FORBES

  • “In 2017, DevOps suffered from under-budgeting and a perception from management that things that were inexpensive as tools were mostly open source. However, non-standardized adoption and expensive DevOps resources skewed budget. With the realization that open source doesn’t equal free, especially as enterprise-grade support is required, there will be increased awareness of the budget needed for skilled DevOps resources. This barrier should get lowered -- organizations will need a budget for experimentation and failure.”

GITHUB

  • “Data will rule all. Over the last several years, Cloud 1.0 has been about computing in big clouds, while Cloud 2.0 is all about data. This includes data movement and the tools and services that support it, like analytics and machine learning systems. Today all companies are data companies, whether they know it or not. In 2018, so long as teams know how to use it, data will become their greatest asset.”
  • “A decade ago, Linux was a big deal. Now it’s standard. Back in the day, companies like Amazon, Google, and Microsoft were forced to build their own, proprietary tools because no other software existed to meet their needs. Many of these frameworks have since been open sourced—and other open source technologies, like Kubernetes, are becoming integral to developers' workflows. This shift is changing what companies are investing in, making open source software traditional software's biggest competitor.”

OPENSOURCE.COM

  • “Containers gain even more acceptance. Container technology is the approach of packaging pieces of code in a standardized way so they can be "plugged and run" quickly in any environment. Container technology allows enterprises to cut costs and implementation times. While the potential of containers to revolutionize IT infrastructure has been evident for a while, actual container use has remained complex. Container technology is still evolving, and the complexities associated with the technology decrease with every advancement. The latest developments make containers quite intuitive and as easy as using a smartphone, not to mention tuned for today's needs, where speed and agility can make or break a business.”

DBTA.COM

  • “Rapid Kubernetes adoption forms the foundation for multi-cloud deployments:We predict runaway success of Kubernetes, but it is running away with the prize of adoption so fast that this may quickly be more of an observation than a prediction in 2018. So far, however, almost everybody is thinking of Kubernetes as a way of organizing and orchestrating computation in a cloud. Over the next year, we expect Kubernetes to more and more be the way that leading-edge companies organize and orchestrate computation across multiple clouds, both public and private. On premises computation is moving to containers and orchestration style at light speed, but when you can interchangeably schedule work anywhere that it makes sense to do so, you will see the real revolution.”
  • “Fear of cloud lock-in will result in cloud sprawl: As CIOs try to diversify investment in their compute providers, inclusive of their own on-premise capabilities, the diversification will result in data, services and algorithms spreading across multiple clouds. Finding information or code within a single cloud is tough enough. The data silos built from multiple clouds will be deep and far apart, pushing the cost of management onto humans that need to understand the infrastructure.”
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Severalnines 2018 Predictions

Several members of our team took a moment to share their thoughts on 2018 and the future of the open source database world…

Vinay Joosery, CEO & Co-Founder

  • Databases in Containers: I think a lot of people have gone from discussing the pros and cons of Docker for databases, and whether it is a good idea at all when it comes to running databases on Docker, to trying it not only in test/dev, but in actual live production!

Alex Yu, VP of Products

  • Cloud and containerized services as the new norm - More companies will start new projects in the cloud with multiple providers and applications will be written with cloud-native architectures in mind.
  • Traditional monolithic applications will continue to give away to more loosely coupled services which are easier to build, deploy, update, scale or move to other cloud and container service providers. Cloud-native services are resilient by nature and will facilitate faster development cycles and feedback loops.
  • Container technologies such as Kubernetes and Docker are already de-facto standard for typical stateless applications/services using "application containers" however databases though are intrinsically stateful and we will hopefully see more use of "system containers" such as LXD coming into the mainstream and gain adoption.
  • This "new world" has an impact on database management and monitoring applications where containerized services come and go frequently. The lifetime of a host is many times longer than a container where the uptime can be measured in only minutes or hours. Host centric management and monitoring will give away to a cloud-native service oriented model where the transient nature is the norm.

Jean-Jérôme Schmidt, VP of Marketing

  • The European Union (EU) is the world’s 2nd largest economy after China and before the US. It’s about to enact a new piece of legislation with far-reaching consequences for anyone doing business involving its residents … and yet it seems to be going almost unnoticed. The European General Data Protection Regulation (GDPR) is due to come into effect in May 2018 and will impact anyone and any business or organisation that deals with and stores EU residents’ personal data in some form or shape. And it will make the organisations that are handling that data responsible for any breaches or misuse. It will therefore be of the highest importance how data is collected, processed and secured. In other words, databases and their infrastructures will be in the spotlight more than ever before and how these database are automated and managed will be crucial for anyone doing business with or within the EU. The GDPR is probably not getting the attention it should because the perception that’s being maintained globally is that the US (and maybe China) are the only large-scale economies worth being concerned with, but the reality is that the EU is the one to be focussed on, particularly next year. So if you’re not sure whether you have your databases and their EU residents data 99.999% under control … contact us ;-)

Ashraf Sharif, Senior Support Engineer

  • We are expecting higher adoption of MySQL 8.0 once it becomes GA. It introduces many notable enhancements like transactional data dictionary, atomic DDL, invisible indexes, common table expression (CTE), windows function and MySQL roles to name some of them. More details at MySQL Documentation page. We are also forecasting a growth in MyRocks storage engine adoption which is already included in Percona Server 5.7 and MariaDB 10.2.
  • MySQL on Docker will be getting much more attention in the coming years, after a number of success stories like Uber and BlablaCar. We've seen many people trying to adapt this technology as a reliable backend data service with the help of in-house automation scripts and Docker orchestration tools. Besides, Docker has announced support for Kubernetes, allowing developers and operators to build apps with Docker and seamlessly test and deploy them using both Docker Swarm and Kubernetes.

Krzysztof Książek, Senior Support Engineer

  • The main new trend that I see today is a move towards column store, analytics databases. MariaDB has it as part of their offering and ClickHouse seems to get traction as a go-to analytics Database engine that works alongside MySQL. ProxySQL's support for ClickHouse also makes it easier for the application to connect to either MySQL or ClickHouse, whatever is needed at that moment. If your dataset is small, you can do analytics in MySQL but there are other tools which do it better - faster and use less disk to store the data.

Our Most Popular Database Blog Posts in 2017

$
0
0

As we wrap up our last blog of 2017 we wanted to reflect on what content we have been creating that’s been resonating and generating the most interest with our readers. We will continue to deliver the best technical content we can for MySQL, Galera Cluster, PostgreSQL, MariaDB, and MongoDB in 2018.

Here is some of our most popular content from 2017…

Top Database Blogs for 2017

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Top Blogs by Technology

While MySQL and MySQL Galera Cluster dominate our most popular content we blog about a lot of different technologies and methodologies on the Severalnines blog. Here are some of the most popular blogs in 2017 for non-MySQL topics.

If there are some blog topics you would like us to cover in 2018 please list them in the comments below.

Announcing ClusterControl 1.5.1 - Featuring Backup Encryption for MySQL, MongoDB & PostgreSQL

$
0
0

What better way to start a new year than with a new product release?

Today we are excited to announce the 1.5.1 release of ClusterControl - the all-inclusive database management system that lets you easily deploy, monitor, manage and scale highly available open source databases - and load balancers - in any environment: on-premise or in the cloud.

ClusterControl 1.5.1 features encryption of backups for MySQL, MongoDB and PostgreSQL, a new topology viewer, support for MongoDB 3.4, several user experience improvements and more!

Feature Highlights

Full Backup and Restore Encryption for these supported backup methods

  • mysqldump, xtrabackup (MySQL)
  • pg_dump, pg_basebackup (PostgreSQL)
  • mongodump (MongoDB)

New Topology View (BETA) shows your replication topology (including load balancers) for your entire cluster to help you visualize your setup.

  • MySQL Replication Topology
  • MySQL Galera Topology

Improved MongoDB Support

  • Support for MongoDB v3.4
  • Fix to add back restore from backup
  • Multiple NICs support. Management/public IPs for monitoring connections and data/private IPs for replication traffic

Misc

Improved user experience featuring a new left-side navigation that includes:

  • Global settings breakout to make it easier to find settings related to a specific feature
  • Quick node actions that allow you to quickly perform actions on your node
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

View Release Details and Resources

Improving Database Security: Backup & Restore Encryption

ClusterControl 1.5 introduces another step to ensuring your databases are kept secure and protected.

Backup & restore encryption means that backups are encrypted at rest using AES-256 CBC algorithm. An auto generated key will be stored in the cluster's configuration file under /etc/cmon.d. The backup files are transferred in encrypted format. Users can now secure their backups for offsite or cloud storage with the flip of a checkbox. This feature is available for select backup methods for MySQL, MongoDB & PostgreSQL.

New Topology View (beta)

This exciting new feature provides an “overhead” topology view of your entire cluster, including load balancers. While in beta, this feature currently supports MySQL Replication and Galera topologies. With this new feature, you can drag and drop to perform node actions. For example, you can drag a replication slave on top of a master node - which will prompt you to either rebuild the slave or change the replication master.

Improved User Experience

The new Left Side Navigation and the new quick actions and settings that accompany it mark the first major redesign to the ClusterControl interface in some time. ClusterControl offers a vast array of functionality, so much so that it can sometimes be overwhelming to the novice. This addition of the new navigation allows the user quick access to what they need on a regular basis and the new node quick actions lets users quickly run common commands and requests right from the navigation.

Download the new ClusterControl or request a demo.

MongoDB Security - Resources to Keep NoSQL DBs Secure

$
0
0

We’ve almost become desensitized to the news. It seems that every other day there is a data breach at a major enterprise resulting in confidential customer information being stolen and sold to the highest bidder.

Data breaches rose by 40% in 2016 and once all the numbers are calculated, 2017 is expected to blow that number out of the water. Yahoo announced the largest breach in history in 2017, other companies like Xbox, Verizon, Equifax, and more also announced major breeches.

Because of the 2017 MongoDB Ransomware Hack, security for MongoDB is hot on everyone's minds.

We decided to pull together some of our top resources that you can use to ensure your MongoDB instances remain secure.

Here are our most popular and relevant resources on the topic of MongoDB Security…

ClusterControl & MongoDB Security

Data is the lifeblood of your business. Whether it’s protecting confidential client data or securing your own IP your business could be doomed should critical data get into the wrong hands. ClusterControl provides many advanced deployment, monitoring and management features to ensure your databases and their data are secure. Learn how!

How to Secure MongoDB with ClusterControl - The Webinar

In March of 2017, at the height of the MongoDB ransomware crisis, we hosted a webinar to talk about how you can keep MongoDB secure using ClusterControl. With authentication disabled by default in MongoDB, learning how to secure MongoDB becomes essential. In this webinar we explain how you can improve your MongoDB security and demonstrate how this is automatically done by ClusterControl.

Using the ClusterControl Developer Studio to Stay Secure

In our blog “MongoDB Tutorial: Monitoring and Securing MongoDB with ClusterControl Advisors” we demonstrated nine of the advisors from our repository for MongoDB that can assist with MongoDB security.

Audit Logging for MongoDB

In our blog “Preemptive Security with Audit Logging for MongoDB” we show that having access to an audit log would have given those affected by the ransom hack the ability to perform pre-emptive measures. The audit log is one of the most underrated features of MongoDB Enterprise and Percona Server for MongoDB. We will uncover its secrets in this blog post.

The 2017 MongoDB Ransom Hack

In January of 2017 thousands of MongoDB servers were held for ransom simply because they were deployed without basic authentication in place. In our first blog on the ransome hack, “Secure MongoDB and Protect Yourself from the Ransom Hack” we explain what happened and some simple steps to keep your data safe. In the second blog, “How to Secure MongoDB from Ransomware - Ten Tips” we went further showing even more things you could do to make sure your MongoDB instances are secure.

The Importance of Automation for MongoDB Security

Severalnines CEO Vinay Joosery shares with us the blog “How MongoDB Database Automation Improves Security” and discusses how the growing number of cyberattacks on open source database deployments highlights the industry’s poor administrative and operational practices. This blog explores how database automation is the key to keeping your MongoDB database secure.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

ClusterControl for MongoDB

Users of MongoDB often have to work with a variety of tools to achieve their requirements; ClusterControl provides an all-inclusive system where you don’t have to cobble together different tools.

ClusterControl offers users a single interface to securely manage their MongoDB infrastructures and mixed open source database environments, while preventing vendor lock-in; whether on premise or in the cloud. ClusterControl offers an alternative to other companies who employ aggressive pricing increases, helping you to avoid vendor lock-in and control your costs.

ClusterControl provides the following features to deploy and manage your MongoDB stacks...

  • Easy Deployment: You can now automatically and securely deploy sharded MongoDB clusters or Replica Sets with ClusterControl’s free community version; as well as automatically convert a Replica Set into a sharded cluster if that’s required.
  • Single Interface: ClusterControl provides one single interface to automate your mixed MongoDB, MySQL, and PostgreSQL database environments.
  • Advanced Security: ClusterControl removes human error and provides access to a suite of security features automatically protecting your databases from hacks and other threats.
  • Monitoring: ClusterControl provides a unified view of all sharded environments across your data centers and lets you drill down into individual nodes.
  • Scaling: Easily add and remove nodes, resize instances, and clone your production clusters with ClusterControl.
  • Management: ClusterControl provides management features that automatically repair and recover broken nodes, and test and automate upgrades.
  • Advisors: ClusterControl’s library of Advisors allows you to extend the features of ClusterControl to add even more MongoDB management functionality.
  • Developer Studio: The ClusterControl Developer Studio lets you customize your own MongoDB deployment to enable you to solve your unique problems.

To learn more about the exciting features we offer for MongoDB click here or watch this video.

How to Secure Your Open Source Databases with ClusterControl

$
0
0

Security is one of the most important aspects of running a database. Whether you are a developer or a DBA, if you are managing the database, it is your responsibility to safeguard your data and protect it from any kind of unauthorized access. The unfortunate fact is that many organizations do not protect their data, as we’ve seen from the new wave of MongoDB ransomware attacks in September 2017. We had earlier published a blog on how to secure MongoDB databases.

In this blog post, we’ll have a look into how to secure your databases using ClusterControl. All of the features described here are available in version 1.5.1 of ClusterControl (released on December 23, 2017). Please note that some features are only available for certain database types.

Backup Encryption

ClusterControl 1.5.1 introduced a new feature called backup encryption. All encrypted backups are marked with a lock icon next to it:

You can use this feature on all backup methods (mysqldump, xtrabackup, mongodump, pg_dump) supported by ClusterControl. To enable encryption, simply toggle on the "Enable Encryption" switch when scheduling or creating the backup. ClusterControl automatically generates a key to encrypt the backup. It uses AES-256 (CBC) encryption algorithm and performs the encryption on-the-fly on the target server. The following command shows an example of how ClusterControl performs a mysqldump backup:

$ mysqldump --defaults-file=/etc/my.cnf --flush-privileges --hex-blob --opt --no-create-info --no-data --triggers --routines --events --single-transaction --skip-comments --skip-lock-tables --skip-add-locks --databases db1 | gzip -6 -c | openssl enc -aes-256-cbc -pass file:/var/tmp/cmon-094508-e0bc6ad658e88d93.tmp | socat - TCP4:192.168.55.170:9999'

You would see the following error if you tried to decompress an encrypted backup without decrypting it first with the proper key:

$ gunzip mysqldump_2018-01-03_175727_data.sql.gz
gzip: mysqldump_2018-01-03_175727_data.sql.gz: not in gzip format

The key is stored inside the ClusterControl database, and can be retrieved from the cmon_backup.metadata file for a particular backup set. It will be used by ClusterControl when performing restoration. Encrypting backups is highly recommended, especially when you want to secure your backups offsite like archiving them in the cloud.

MySQL/PostgreSQL Client-Server Encryption

Apart from following the recommended security steps during deployment, you can increase the reliability of your database service by using client-server SSL encryption. Using ClusterControl, you can perform this operation with simple point and click:

You can then retrieve the generated keys and certificates directly from the ClusterControl host under /var/lib/cmon/ca path to establish secure connections with the database clients. All the keys and certificates can be managed directly under Key Management, as described further down.

Database Replication Encryption

Replication traffic within a Galera Cluster can be enabled with just one click. ClusterControl uses a 2048-bit default key and certificate generated on the ClusterControl node, which is transferred to all the Galera nodes:

A cluster restart is necessary. ClusterControl will perform a rolling restart operation, taking one node at a time. You will see a green lock icon next to the database server (Galera indicates Galera Replication encryption, while SSL indicates client-server encryption) in the Hosts grid of the Overview page once encryption is enabled:

All the keys and certificates can be managed directly under Key Management, as described further down.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Key Management

All the generated keys and certificates can be managed directly from the ClusterControl UI. Key Management allows you to manage SSL certificates and keys that can be provisioned on your clusters:

If the certificate has expired, you can simply use the UI to generate a new certificate with proper key and Certificate Authority (CA), or import an existing key and certificate into ClusterControl host.

Security Advisors

Advisors are mini-programs that run in ClusterControl. They perform specific tasks and provide advice on how to address issues in areas such as performance, security, log management, configuration, storage space and others. Each advisor can be scheduled like a cron job, and run as a standalone executable within the ClusterControl UI. It can also be run via the ClusterControl 's9s' command line client.

ClusterControl enables two security advisors for MySQL-based systems:

  • Access from any host ('%') - Identifies all users that use a wildcard host from the mysql system table, and lets you have more control over which hosts are able to connect to the servers.
  • Check number of accounts without a password - Identifies all users who do not have a password in the mysql system table.

For MongoDB, we have the following advisors:

  • MongoDB authentication enabled - Check whether the MongoDB instance is running with authentication mode enabled.
  • Authorization check - Check whether MongoDB users are authorized with too permissive role for access control.

For more details on how does ClusterControl performs the security checks, you can look at the advisor JavaScript-like source code under Manage -> Developer Studio. You can see the execution results from the Advisors page:

Multiple Network Interfaces

Having multiple NICs on the database hosts allows you to separate database traffic from management traffic. One network is used by the database nodes in order to communicate to each other, and this network is not exposed to any public network. The other network is used by ClusterControl, for management purposes. ClusterControl is able to deploy such a multi-network setup. Consider the following architecture diagram:

To import the above database cluster into ClusterControl, one would specify the primary IP address of the database hosts. Then, it is possible to choose the management network as well as the data network:

ClusterControl can also work in an environment without Internet access, with the databases being totally isolated from the public network. The majority of the features will work just fine. If the ClusterControl host is configured with Internet, it is also capable of cloning the database vendor's repository for the internet-less database servers. Just go to Settings (top menu) -> Repositories -> Create New Repository and set the options to fit the target database server environment:

The mirroring may take about 10 to 20 minutes depending on the internet connection, you will see the new item in the list later on. You can then pick this repository instead when scaling or deploying a new cluster, without the need for the database hosts to have any Internet connection (note that the operating system’s offline repository should be in place as well).

MySQL Users Management

The MySQL privilege system ensures that all users can perform only the operations they are allowed to. Granting is critical as you don't want to give all users complete access to your database, but you need users to have the necessary permissions to run queries and perform daily tasks.

ClusterControl provides an interactive user interface to manage the database schemas and privileges. It unifies the accounts on all MySQL servers in the cluster and simplifies the granting process. You can easily visualize the database users, so you avoid making mistakes.

As you can see in the above screenshot, ClusterControl greyed out unnecessary privileges if you only want to grant a user to a database (shopdb). "Require SSL?" is only enabled if the client/server SSL encryption is enabled while the administration privilege checkboxes are totally disabled if a specific database is defined. You can also inspect the generated GRANT statement at the bottom of the wizard, to see the statement that ClusterControl will execute to create this user. This helper looks pretty simple, but creating users and granting privileges can be error-prone.

ClusterControl also provides a list of inactive users for all database nodes in the cluster, showing off the accounts that have not been used since the last server restart:

This alerts the administrator for unnecessary accounts that exist, and that could potentially harm the server. The next step is to verify if the accounts are no longer active, and you can simply use the "Drop Selected User" option in order to remove them. Make sure you have enough database activity to ensure the list generated by ClusterControl is accurate. The longer the server uptime, the better.

Always Keep Up-to-date

For production use, it’s highly recommended for you to install the database-related packages from the vendor’s repository. Don’t rely on the default operating system repository, where the packages are usually outdated. If you are running in a cluster environment like Galera Cluster, or even MySQL Replication, you always have the choice to patch the system with minimal downtime.

ClusterControl supports automatic minor version rolling upgrade for MySQL/MariaDB with a single click. Just go to Manage -> Upgrades -> Upgrade and choose the appropriate major version for your running cluster. ClusterControl will then perform the upgrade, on one node at a time. The node will be stopped, then software will be updated, and then the node will be started again. If a node fails to upgrade, the upgrade process is aborted and the admin is notified. Upgrades should only be performed when there is as little traffic as possible on the cluster.

Major versions upgrades (e.g, from MySQL 5.6 to MySQL 5.7) are intentionally not automated. Major upgrades usually require uninstallation of the existing packages, which is a risky task to automate. Careful planning and testing is necessary for such kind of upgrades.

Database security is an important aspect of running your database in production. From all the incidents we frequently read about in the news (and there are probably many others that are not publicized), it is clear that there are groups busy out there with bad intentions. So, make sure your databases are well protected.

New Whitepaper: How to Automate and Manage MongoDB with ClusterControl

$
0
0

At the time of writing this blog, MongoDB is the world’s leading NoSQL database server, and (per DB-Engines ranking, the most widely-known ranking in the database industry) the 5th database server overall in terms of popularity.

As you may have seen before, we’ve published a ‘Become a MongoDB DBA’ blog series, which covers all the need-to-know information when getting started with MongoDB (for example, when you’re rather a MySQL DBA) and we have now taken the next logical step in our work on MongoDB by producing this new white paper: MongoDB Management and Automation with ClusterControl.

This white paper extends on the Become a MongoDB DBA series by focussing further on how to manage and automate MongoDB with the help of ClusterControl, our all-inclusive management system for open source databases.

While MongoDB does have great features for developers, some key questions arise:

What of the operational management of a production environment?

How easy is it to deploy a distributed environment, and then manage it?

In this whitepaper, we cover some of the fundamentals of MongoDB, and show you how a clustered environment can be automated with ClusterControl.

Download the white paper

To summarise, in this white paper, we have reviewed the challenges involved in managing MongoDB at scale and have introduced mitigating features of ClusterControl. As a best of breed database management solution, ClusterControl brings consistency and reliability to your database environment, and simplifies your database operations at scale.

The main topics covered include...

Considerations for administering MongoDB

  • Built-in Redundancy
  • Scalability
  • Arbiters
  • Delayed Replica Set Members
  • Backups
  • Monitoring

Automation with ClusterControl

  • Deployment
  • Backup & Restore
  • Monitoring
  • MongoDB Advisors
  • Integrations
  • Command-Line Access

Download the white paper

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

ClusterControl is the all-inclusive open source database management system for users with mixed environments that removes the need for multiple management tools. It provides advanced deployment, management, monitoring, and scaling functionality to get your MySQL, MongoDB, and PostgreSQL databases up- and- running using proven methodologies that you can depend on to work. At the core of ClusterControl is its automation functionality that lets you automate many of the database tasks you have to perform regularly like deploying new databases, adding and scaling new nodes, running backups and upgrades, and more.

Download ClusterControl

Updated: Become a ClusterControl DBA - Deploying your Databases and Clusters

$
0
0

We get some nice feedback with regards to our product ClusterControl, especially how easy it is to install and get going. Installing new software is one thing, but using it properly is another.

It is not uncommon to be impatient to test new software and one would rather toy around with a new exciting application than to read documentation before getting started. That is a bit unfortunate as you may miss important features or misunderstand how to use them.

This blog series covers all the basic operations of ClusterControl for MySQL, MongoDB & PostgreSQL with examples on how to make the most of your setup. It provides you with a deep dive on different topics to save you time.

These are the topics covered in this series:

  • Deploying the first clusters
  • Adding your existing infrastructure
  • Performance and health monitoring
  • Making your components HA
  • Workflow management
  • Safeguarding your data
  • Protecting your data
  • In depth use case

In today’s post, we’ll cover installing ClusterControl and deploying your first clusters.

Preparations

In this series, we will make use of a set of Vagrant boxes but you can use your own infrastructure if you like. In case you do want to test it with Vagrant, we made an example setup available from the following Github repository: https://github.com/severalnines/vagrant

Clone the repo to your own machine:

$ git clone git@github.com:severalnines/vagrant.git

The topology of the vagrant nodes is as follows:

  • vm1: clustercontrol
  • vm2: database node1
  • vm3: database node2
  • vm4: database node3

You can easily add additional nodes if you like by changing the following line:

4.times do |n|

The Vagrant file is configured to automatically install ClusterControl on the first node and forward the user interface of ClusterControl to port 8080 on your host that runs Vagrant. So if your host’s ip address is 192.168.1.10, you will find the ClusterControl UI here: http://192.168.1.10:8080/clustercontrol/

Installing ClusterControl

You can skip this if you chose to use the Vagrant file, and get the automatic installation. But installation of ClusterControl is straightforward and will take less than five minutes.

With the package installation, all you have to do is to issue the following three commands on the ClusterControl node to get it installed:

$ wget http://www.severalnines.com/downloads/cmon/install-cc
$ chmod +x install-cc
$ ./install-cc   # as root or sudo user

That’s it: it can’t get easier than this. If the installation script has not encountered any issues, then ClusterControl should be installed and up and running. You can now log into ClusterControl on the following URL: http://192.168.1.210/clustercontrol

After creating an administrator account and logging in, you will be prompted to add your first cluster.

Deploy a Galera cluster

You will be prompted to create a new database server/cluster or import an existing (i.e., already deployed) server or cluster:

We are going to deploy a Galera cluster. There are two sections that need to be filled in. The first tab is related to SSH and general settings:

To allow ClusterControl to install the Galera nodes, we use the root user that was granted SSH access by the Vagrant bootstrap scripts. In case you chose to use your own infrastructure, you must enter a user here that is allowed to do passwordless SSH to the nodes that ClusterControl will control. Just keep in mind that you have to setup passwordless SSH from ClusterControl to all database nodes by yourself beforehand.

Also make sure you disable AppArmor/SELinux. See here why.

Then, proceed to the second stage and specify the database related information and the target hosts:

ClusterControl will immediately perform some sanity checks each time you press Enter when adding a node. You can see the host summary by hovering over each defined node. Once everything is green, it means that ClusterControl has connectivity to all nodes, you can click Deploy. A job will be spawned to build the new cluster. The nice thing is that you can keep track of the progress of this job by clicking on the Activity -> Jobs -> Create Cluster -> Full Job Details:

Once the job has finished, you have just created your first cluster. The cluster overview should look like this:

In the nodes tab, you can do about any operation you normally would do on a cluster. The query monitor gives you a good overview of both running and top queries. The performance tab will help you keep a close eye on the performance of your cluster and also features the advisors that help you act proactively on trends in data. The backup tab enables you to easily schedule backups and store them on local or cloud storage. The manage tab enables you to expand your cluster or make it highly available for your applications through a load balancer.

All this functionality will be covered in later blog posts in this series.

Deploy a MySQL Replication Cluster

Deploying a MySQL Replication setup is similar to Galera database deployment, except that it has an additional tab in the deployment dialog where you can define the replication topology:

You can set up standard master-slave replication, as well as master-master replication. In case of the latter, only one master will remain writable at a time. Keep in mind that master-master replication doesn't come with conflict resolution and guaranteed data consistency, as in the case of Galera. Use this setup with caution, or look into Galera cluster. Once everything is green and you have clicked Deploy, a job will be spawned to build the new cluster.

Again, the deployment progress is available under Activity -> Jobs.

To scale out the slave (read copy), simply use the “Add Node” option in the cluster list:

After adding the slave node, ClusterControl will provision the slave with a copy of the data from its master using Xtrabackup or from any existing PITR compatible backups for that cluster.

Deploy PostgreSQL Replication

ClusterControl supports the deployment of PostgreSQL version 9.x and higher. The steps are similar with MySQL Replication deployment, where at the end of the deployment step, you can define the database topology when adding the nodes:

Similar to MySQL Replication, once the deployment completes, you can scale out by adding replications slave to the cluster. The step is as simple as selecting the master and filling in the FQDN for the new slave:

ClusterControl will then perform the necessary data staging from the chosen master using pg_basebackup, configure the replication user and enable the streaming replication. The PostgreSQL cluster overview gives you some insight into your setup:

Just like with the Galera and MySQL cluster overviews, you can find all the necessary tabs and functions here: the query monitor, performance, backup tabs all enable you to do the necessary operations.

Deploy a MongoDB Replica Set

Deploying a new MongoDB Replica Set is similar to the other clusters. From the Deploy Database Cluster dialog, pick MongoDB ReplicatSet, define the preferred database options and add the database nodes:

You can either choose to install Percona Server for MongoDB from Percona or MongoDB Server from MongoDB, Inc (formerly 10gen). You also need to specify the MongoDB admin user and password since ClusterControl will deploy by default a MongoDB cluster with authentication enabled.

After installing the cluster, you can add an additional slave or arbiter node into the replica set using the "Add Node" menu under the same dropdown from the cluster overview:

After adding the slave or arbiter to the replica set, a job will be spawned. Once this job has finished it will take a short while before MongoDB adds it to the cluster and it becomes visible in the cluster overview:

Final thoughts

With these three examples we have shown you how easy it is to set up different clusters from scratch in only a couple of minutes. The beauty of using this Vagrant setup is that, as easy as spawning this environment, you can also take it down and then spawn again. Impress your fellow colleagues by showing how quickly you can setup a working environment.

Of course it would be equally interesting to add existing hosts and already-deployed clusters into ClusterControl, and that’s what we'll cover next time.


New Webinar on How to Design Open Source Databases for High Availability

$
0
0

Join us March 27th for this webinar on how to design open source databases for high availability with Ashraf Sharif, Senior Support Engineer at Severalnines. From discussing high availability concepts through to failover or switch over mechanisms, Ashraf will cover all the need-to-know information when it comes to building highly available database infrastructures.

It’s been said that not designing for failure leads to failure; but what is the best way to design a database system from the ground up to withstand failure?

Designing open source databases for high availability can be a challenge as failures happen in many different ways, which sometimes go beyond imagination. This is one of the consequences of the complexity of today’s open source database environments.

At Severalnines we’re big fans of high availability databases and have seen our fair share of failure scenarios across the thousands of database deployment attempts that we come across every year.

In this webinar, we’ll look at the different types of failures you might encounter and what mechanisms can be used to address them. We will also look at some of popular high availability solutions used today, and how they can help you achieve different levels of availability.

Sign up for the webinar

Date, Time & Registration

Europe/MEA/APAC

Tuesday, March 27th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, March 27th at 09:00 PDT (US) / 12:00 EDT (US)

Register Now

Agenda

  • Why design for High Availability?
  • High availability concepts
    • CAP theorem
    • PACELC theorem
  • Trade offs
    • Deployment and operational cost
    • System complexity
    • Performance issues
    • Lock management
  • Architecting databases for failures
    • Capacity planning
    • Redundancy
    • Load balancing
    • Failover and switchover
    • Quorum and split brain
    • Fencing
    • Multi datacenter and multi-cloud setups
    • Recovery policy
  • High availability solutions
    • Database architecture determines Availability
    • Active-Standby failover solution with shared storage or DRBD
    • Master-slave replication
    • Master-master cluster
  • Failover and switchover mechanisms
    • Reverse proxy
    • Caching
    • Virtual IP address
    • Application connector

Sign up for the webinar

Speaker

Ashraf Sharif is System Support Engineer at Severalnines. He was previously involved in hosting world and LAMP stack, where he worked as principal consultant and head of support team and delivered clustering solutions for large websites in the South East Asia region. His professional interests are on system scalability and high availability.

Updated: Become a ClusterControl DBA: Managing your Database Configurations

$
0
0

In the past five posts of the blog series, we covered deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale and in the last post, how to prepare yourself for disasters by scheduling backups.

Since ClusterControl 1.2.11, we made major enhancements to the database configuration manager. The new version allows changing of parameters on multiple database hosts at the same time and, if possible, changing their values at runtime.

We featured the new MySQL Configuration Management in a Tips & Tricks blog post, but this blog post will go more in depth and cover Configuration Management within ClusterControl for MySQL, PostgreSQL and MongoDB.

ClusterControl Configuration management

The configuration management interface can be found under Manage > Configurations. From here, you can view or change the configurations of your database nodes and other tools that ClusterControl manages. ClusterControl will import the latest configuration from all nodes and overwrite previous copies made. Currently there is no historical data kept.

If you’d rather like to manually edit the config files directly on the nodes, you can re-import the altered configuration by pressing the Import button.

And last but not least: you can create or edit configuration templates. These templates are used whenever you deploy new nodes in your cluster. Of course any changes made to the templates will not retroactively applied to the already deployed nodes that were created using these templates.

MySQL Configuration Management

As previously mentioned, the MySQL configuration management got a complete overhaul in ClusterControl 1.2.11. The interface is now more intuitive. When changing the parameters ClusterControl checks whether the parameter actually exists. This ensures your configuration will not deny startup of MySQL due to parameters that don’t exist.

From Manage -> Configurations, you will find an overview of all config files used within the selected cluster, including load balancer nodes.

We use a tree structure to easily view hosts and their respective configuration files. At the bottom of the tree, you will find the configuration templates available for this cluster.

Changing parameters

Suppose we need to change a simple parameter like the maximum number of allowed connections (max_connections), we can simply change this parameter at runtime.

First select the hosts to apply this change to.

Then select the section you want to change. In most cases, you will want to change the MYSQLD section. If you would like to change the default character set for MySQL, you will have to change that in both MYSQLD and client sections.

If necessary you can also create a new section by simply typing the new section name. This will create a new section in the my.cnf.

Once we change a parameter and set its new value by pressing “Proceed”, ClusterControl will check if the parameter exists for this version of MySQL. This is to prevent any non-existent parameters to block the initialization of MySQL on the next restart.

When we press “proceed” for the max_connections change, we will receive a confirmation that it has been applied to the configuration and set at runtime using SET GLOBAL. A restart is not required as max_connections is a parameter we can change at runtime.

Now suppose we want to change the bufferpool size, this would require a restart of MySQL before it takes effect:

And as expected the value has been changed in the configuration file, but a restart is required. You can do this by logging into the host manually and restarting the MySQL process. Another way to do this from ClusterControl is by using the Nodes dashboard.

Restarting nodes in a Galera cluster

You can perform a restart per node by selecting “Restart Node” and pressing the “Proceed” button.

When you select “Initial Start” on a Galera node, ClusterControl will empty the MySQL data directory and force a full copy this way. This is, obviously, unnecessary for a configuration change. Make sure you leave the “initial” checkbox unchecked in the confirmation dialog. This will stop and start MySQL on the host but depending on your workload and bufferpool size this could take a while as MySQL will start flushing the dirty pages from the InnoDB bufferpool to disk. These are the pages that have been modified in memory but not on disk.

Restarting nodes in a MySQL master-slave topologies

For MySQL master-slave topologies you can’t just restart node by node. Unless downtime of the master is acceptable, you will have to apply the configuration changes to the slaves first and then promote a slave to become the new master.

You can go through the slaves one by one and execute a “Restart Node” on them.

After applying the changes to all slaves, promote a slave to become the new master:

After the slave has become the new master, you can shutdown and restart the old master node to apply the change.

Importing configurations

Now that we have applied the change directly on the database, as well as the configuration file, it will take until the next configuration import to see the change reflected in the configuration stored in ClusterControl. If you are less patient, you can schedule an immediate configuration import by pressing the “Import” button.

PostgreSQL Configuration Management

For PostgreSQL, the Configuration Management works a bit different from the MySQL Configuration Management. In general, you have the same functionality here: change the configuration, import configurations for all nodes and define/alter templates.

The difference here is that you can immediately change the whole configuration file and write this configuration back to the database node.

If the changes made requires a restart, a “Restart” button will appear that allows you to restart the node to apply the changes.

MongoDB Configuration Management

The MongoDB Configuration Management works similar to the MySQL Configuration Management: you can change the configuration, import configurations for all nodes, change parameters and alter templates.

Changing the configuration is pretty straightforward, by using Change Parameter dialog (as described in the "Changing Parameters" section::

Once changed, you can see the post-modification action proposed by ClusterControl in the "Config Change Log" dialog:

You can then proceed to restart the respective MongoDB nodes, one node at a time, to load the changes.

Final thoughts

In this blog post we learned about how to manage, alter and template your configurations in ClusterControl. Changing the templates can save you a lot of time when you have deployed only one node in your topology. As the template will be used for new nodes, this will save you from altering all configurations afterwards. However for MySQL and MongoDB based nodes, changing the configuration on all nodes has become trivial due to the new Configuration Management interface.

As a reminder, we recently covered in the same series deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale and in the last post, how to prepare yourself for disasters by scheduling backups.

New Webinar: How to Measure Database Availability

$
0
0

Join us on April 24th for Part 2 of our database high availability webinar special!

In this session we will focus on how to measure database availability. It is notoriously hard to measure and report on, although it is an important KPI in any SLA between you and your customer. With that in mind, we will discuss the different factors that affect database availability and see how you can measure your database availability in a realistic way.

It is common enough to define availability in terms of 9s (e.g. 99.9% or 99.999%) - especially here at Severalnines - although there are often different opinions as to what these numbers actually mean, or how they are measured.

Is the database available if an instance is up and running, but it is unable to serve any requests? Or if response times are excessively long, so that users consider the service unusable? Is the impact of one longer outage the same as multiple shorter outages? How do partial outages affect database availability, where some users are unable to use the service while others are completely unaffected?

Not agreeing on precise definitions with your customers might lead to dissatisfaction. The database team might be reporting that they have met their availability goals, while the customer is dissatisfied with the service.

Join us for this webinar during which we will discuss the different factors that affect database availability and see how to measure database availability in a realistic way.

Register for the webinar

Date, Time & Registration

Europe/MEA/APAC

Tuesday, April 24th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, April 24th at 09:00 PDT (US) / 12:00 EDT (US)

Register Now

Agenda

  • Defining availability targets
    • Critical business functions
    • Customer needs
    • Duration and frequency of downtime
    • Planned vs unplanned downtime
    • SLA
  • Measuring the database availability
    • Failover/Switchover time
    • Recovery time
    • Upgrade time
    • Queries latency
    • Restoration time from backup
    • Service outage time
  • Instrumentation and tools to measure database availability:
    • Free & open-source tools
    • CC's Operational Report
    • Paid tools

Register for the webinar

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

Key Things to Monitor in MongoDB

$
0
0

Enhancing system performance, especially for computer structures, requires a process of getting a good overview of performance. This process is generally called monitoring. Monitoring is an essential part of database management and the detailed performance information of your MongoDB will not only help you to gauge its functional state; but also give a clue on anomalies, which is helpful when doing maintenance. It is essential to identify unusual behaviours and fix them before they escalate into more serious failures.

Some of the types of failures that could arise are...

  • Lag or slowdown
  • Resource inadequacy
  • System hiccup

Monitoring is often centered on analyzing metrics. Some of the key metrics you will want to monitor include...

  • Performance of the database
  • Utilization of resources (CPU usage, available memory and Network usage)
  • Emerging setbacks
  • Saturation and limitation of the resources
  • Throughput operations

In this blog we are going to discuss, in detail, these metrics and look at available tools from MongoDB (such as utilities and commands.) We will also look at other software tools such as Pandora, FMS Open Source, and Robo 3T. For the sake of simplicity, we are going to use the Robo 3T software in this article to demonstrate the metrics.

Performance of the Database

The first and foremost thing to check on a database is its general performance, for example, whether the server is active or not. If you run this command db.serverStatus() on a database in Robo 3T, you will be presented with this information showing the state of your server.

Replica sets

Replica set is a group of mongod processes that maintain the same data set. If you are using replica sets especially in the production mode, operation logs will provide a foundation for the replication process. All the write operations are tracked using nodes, that is a primary node and a secondary node, which store a limited-size collection. On the primary node, the write operations are applied and processed. However, if the primary node fails before they are copied to the operation logs, then the secondary writing is made but in this case the data might not be replicated.

Key metrics to keep an eye on...

Replication Lag

This defines how far the secondary node is behind the primary node. An optimal state requires the gap be as minute as possible. On a normal operating system, this lag is estimated to be 0. If the gap is too wide then data integrity will be compromised once the secondary node is promoted to primary. In this case you can set a threshold, for example 1 minute, and if it is exceeded an alert is set. Common causes of wide replication lag include...

  1. Shards that may have an insufficient write capacity which is often associated with resources saturation.
  2. The secondary node is providing data at a slower rate than the primary node.
  3. Nodes may also be hindered in some way from communicating, possibly due to a poor network.
  4. Operations on the primary node could also be slower, thereby blocking replication. If this happens you can run the following commands:
    1. db.getProfilingLevel(): if you get a value of 0, then your db operations are optimal.
      If the value is 1, then it corresponds to slow operations which can be consequently due to slow queries.
    2. db.getProfilingStatus(): in this case we check the value of slowms, by default it is 100ms. If the value is larger than this, then you might be having heavy write operations on the primary or inadequate resources on the secondary. In order to solve this, you can scale the secondary so it has as much resources as the primary.

Cursors

If you make a read request for example find, you will be provided with a cursor which is a pointer to the data set of the result. If you run this command db.serverStatus() and navigate to the metrics object then cursor, you will see this…

In this case, the cursor.timeOut property was updated incrementally to 9 because there were 9 connections that died without closing the cursor. The consequence is that it will remain open on the server and hence consuming memory, unless it is reaped by the default MongoDB setting. An alert to you should be identifying non-active cursors and reaping them off in order to save on memory. You can also avoid non-timeout cursors because they often hold on to resources, thereby slowing down the internal system performance. This can be achieved by setting the value of the cursor.open.noTimeout property to a value of 0.

Journaling

Considering the WiredTiger Storage Engine, before data is recorded, it is first written to the disk files. This is referred to as journaling. Journaling ensures the availability and durability of data on an event of failure from which a recovery can be carried out.

For the purpose of recovery, we often use checkpoints (especially for the WiredTiger storage system) to recover from the last checkpoint. However, if MongoDB shuts down unexpectedly, then we use the journaling technique to recover any data that was processed or provided after the last checkpoint.

Journaling should not be turned off in the first case, since it only takes like 60 seconds to create a new checkpoint. Hence if a failure occurs, MongoDB can replay the journal to recover data lost within these seconds.

Journaling generally narrows the time interval from when data is applied to memory until it is durable on disk. The storage.journal object has a property that describes the commiting frequency, that is, commitIntervalMs which is often set to a value of 100ms for WiredTiger. Tuning it to a lower value will enhance frequent recording of writes hence reducing instances of data loss.

Locking Performance

This can be caused by multiple read and write requests from many clients. When this happens there is a need to keep consistency and avoid write conflicts. In order to achieve this MongoDB uses multi-granularity-locking which allows locking operations to occur at different levels, such as global, database, or collection level.

If you have poor schema design patterns, then you will be vulnerable to locks being held for long durations. This is often experienced when making two or more different write operations to a single document in the same collection, with a consequence of blocking each other. For the WiredTiger storage engine we can use the ticket system where read or write requests come from something like a queue or thread.

By default the concurrent number of read and write operations are defined by the parameters wiredTigerConcurrentWriteTransactions and wiredTigerConcurrentReadTransactions which are both set to a value of 128.

If you scale this value too high then you will end up being limited by CPU resources. To increase throughput operations, it would be advisable to scale horizontally by providing more shards.

Severalnines
 
Become a MongoDB DBA - Bringing MongoDB to Production
Learn about what you need to know to deploy, monitor, manage and scale MongoDB

Utilization of Resources

This generally describes usage of available resources such as the CPU capacity/ processing rate and RAM. The performance, especially for the CPU can change drastically in accordance to unusual traffic loads. Things to check on include...

  1. Number of connections
  2. Storage
  3. Cache

Number of Connections

If the number of connections is higher than what the database system can handle then there will be a lot of queuing. Consequently, this will overwhelm performance of the database and make your setup run slowly. This number can result in driver issues or even complications with your application.

If you monitor a certain number of connections for some period and then notice that that value has peaked, it is always a good practice to set an alert if the connection exceeds this number.

If the number is getting too high then you can scale up in order to cater to this rise. To do this you have to know the number of connections available within a given period, otherwise, if the available connections are not enough, then requests will not be handled in a timely fashion.

By default MongoDB provides support for up to 1 million connections. With your monitoring, always ensure the current connections never get too close to this value. You can check the value in the connections object.

Storage

Every row and data record in MongoDB is referred to as a document. Document data is in BSON format. On a given database, if you run the command db.stats(), you will be presented with this data.

  • StorageSize defines the size of all data extents in the database.
  • IndexSize outlines the size of all indexes created within that database.
  • dataSize is a measure of the total space taken by the documents in the database.

You can sometimes see a change in memory, especially if a lot of data has been deleted. In this case you should set up an alert in order to ensure it was not due to malicious activity.

Sometimes, the overall storage size may shoot up while the database traffic graph is constant and in this case, you should check your application or database structure to avoid having duplicates if not needed.

Like the general memory of a computer, MongoDB also has caches in which active data is temporarily stored. However, an operation may request for data which is not in this active memory, hence making a request from the main disk storage. This request or situation is referred to as page fault. Page fault requests come with a limitation of taking longer time to execute, and can be detrimental when they occur frequently. To avoid this scenario, ensure the size of your RAM is always enough to cater to the data sets you are working with. You should also ensure you have no schema redundancy or unnecessary indices.

Cache

Cache is a temporal data storage item for frequently accessed data. In WiredTiger the file system cache and storage engine cache are often employed. Always ensure that your working set does not bulge beyond the available cache, otherwise, the page faults will increase in number causing some performance issues.

At some point you may decide to modify your frequent operations, but the changes are sometimes not reflected in the cache. This unmodified data is referred to as “Dirty Data.” It exists because it has not yet been flushed to disk. Bottlenecks will result if the amount of “Dirty Data” grows to some average value defined by slow writing to the disk. Adding more shards will help to reduce this number.

CPU Utilization

Improper indexing, poor schema structure and unfriendly designed queries will require more CPU attention hence will obviously increase its utilization.

Throughput operations

To a large extent getting enough information on these operations can enable one to avoid consequential setbacks such as errors, saturation of resources, and functional complications.

You should always take note of the number of read and write operations to the database, that is, a high-level view of the cluster’s activities. Knowing the number of operations generated for the requests will enable you to calculate the load that the database is expected to handle. The load can then be handled either scaling up your database or scaling out; depending on the type of resources you have. This allows you to easily gauge the quotient ratio in which the requests are accumulating to the rate at which they are being processed. Furthermore, you can optimize your queries appropriately in order to improve the performance.

In order to check the number of read and write operations, run this command db.serverStatus(), then navigate to the locks.global object, the value for the property r represents the number of read requests and w number of writes.

More often the read operations are more than the write operations. Active client metrics are reported under globalLock.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Saturation and Limitation of Resources

Sometimes the database may fail to keep in pace with the rate of writing and reading, as portrayed by an increasing number of queued requests. In this case, you have to scale up your database by providing more shards to enable MongoDB to address the requests fast enough.

Emerging Setbacks

MongoDB log files always give a general overview on assert exceptions returned. This result will give you a clue on the possible causes of errors. If you run the command, db.serverStatus(), some of the error alerts you will note include:

  1. Regular asserts: these are as a result of an operation failure. For example in a schema if a string value is provided to an integer field hence resulting in failure reading the BSON document.
  2. Warning asserts: these are often alerts on some issue but are not having much impact on its operation. For example when you upgrade your MongoDB you might be alerted using deprecated functions.
  3. Msg asserts: they are as a result of internal server exceptions such as slow network or if the server is not active.
  4. User asserts: like regular asserts, these errors arise when executing a command but they are often returned to the client. For example if there are duplicate keys, inadequate disk space or no access to write into the database. You will opt to check your application to fix these errors.

An Overview of Database Indexing for MongoDB

$
0
0

What is Indexing?

Indexing is an important concept in database world. Main advantage of creating index on any field is faster access of data . It optimizes the process of database searching and accessing. Consider this example to understand this.

When any user asks for a specific row from the database, what will DB system do? It will start from the first row and check whether this is the row that the user wants? If yes, then return that row, otherwise continue searching for the row till the end.

Generally, when you define an index on a particular field, the DB system will create a ordered list of that field’s value and store it in a different table. Each entry of this table will point to the corresponding values in the original table. So when the user tries to search for any row, it will first search for the value in the index table using binary search algorithm and return the corresponding value from the original table. This process will take less time because we are using binary search instead of linear search.

In this article, we will focus in MongoDB Indexing and understand how to create and use indexes in MongoDB.

How to Create an Index in MongoDB Collection?

To create index using Mongo shell, you can use this syntax:

db.collection.createIndex( <key and index type specification>, <options> )

Example:

To create index on name field in myColl collection:

db.myColl.createIndex( { name: -1 } )

Types of MongoDB Indexes

  1. Default _id Index

    This is the default index which will be created by MongoDB when you create a new collection. If you don’t specify any value for this field, then _id will be primary key by default for your collection so that a user can’t insert two documents with same _id field values. You can’t remove this index from the _id field.

  2. Single Field Index

    You can use this index type when you want to create a new index on any field other than _id field.

    Example:

    db.myColl.createIndex( { name: 1 } )

    This will create a single key ascending index on name field in myColl collection

  3. Compound Index

    You can also create an index on multiple fields using Compound indexes. For this index, order of the fields in which they are defined in the index matters. Consider this example:

    db.myColl.createIndex({ name: 1, score: -1 })

    This index will first sort the collection by name in ascending order and then for each name value, it will sort by score values in descending order.

  4. Multikey Index

    This index can be used to index array data. If any field in a collection has an array as its value then you can use this index which will create separate index entries for each elements in array. If the indexed field is an array, then MongoDB will automatically create Multikey index on it.

    Consider this example:

    {
    ‘userid’: 1,
    ‘name’: ‘mongo’,
    ‘addr’: [
        {zip: 12345, ...},
    {zip: 34567, ...}
    ]
    }

    You can create a Multikey index on addr field by issuing this command in Mongo shell.

    db.myColl.createIndex({ addr.zip: 1 })
  5. Geospatial Index

    Suppose you have stored some coordinates in MongoDB collection. To create index on this type fields(which has geospatial data), you can use a Geospatial index. MongoDB supports two types of geospatial indexes.

    • 2d Index: You can use this index for data which is stored as points on 2D plane.

      db.collection.createIndex( { <location field> : "2d" } )
    • 2dsphere Index: Use this index when your data is stored as GeoJson format or coordinate pairs(longitude, latitude)

    db.collection.createIndex( { <location field> : "2dsphere" } )
  6. Text Index

    To support queries which includes searching for some text in the collection, you can use Text index.

    Example:

    db.myColl.createIndex( { address: "text" } )
  7. Hashed Index

    MongoDB supports hash-based sharding. Hashed index computes the hash of the values of the indexed field. Hashed index supports sharding using hashed sharded keys. Hashed sharding uses this index as shard key to partition the data across your cluster.

    Example:

    db.myColl.createIndex( { _id: "hashed" } )
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Properties of Index

  1. Unique Index

    This property ensures that there are no duplicate values in the indexed field. If any duplicates are found while creating the index, then it will discard those entries.

  2. Sparse Index

    This property ensures that all queries search documents with indexed field. If any document doesn’t have an indexed field, then it will be discarded from the result set.

  3. TTL Index

    This index is used to automatically delete documents from a collection after specific time interval(TTL) . This is ideal for removing documents of event logs or user sessions.

Performance Analysis

Consider a collection of student scores. It has exactly 3000000 documents in it. We haven’t created any indexes in this collection. See this image below to understand the schema.

Sample documents in score collection
Sample documents in score collection

Now, consider this query without any indexes:

db.scores.find({ student: 585534 }).explain("executionStats")

This query takes 1155ms to execute. Here is the output. Search for executionTimeMillis field for the result.

Execution time without indexing
Execution time without indexing

Now let’s create index on student field. To create the index run this query.

db.scores.createIndex({ student: 1 })

Now the same query takes 0ms.

Execution time with indexing
Execution time with indexing

You can clearly see the difference in execution time. It’s almost instantaneous. That’s the power of indexing.

Conclusion

One obvious takeaway is: Create indexes. Based on your queries, you can define different types of indexes on your collections. If you don’t create indexes, then each query will scan the full collections which takes a lot of time making your application very slow and it uses lots of resources of your server. On the other hand, don’t create too many indexes either because creating unnecessary indexes will cause extra time overhead for all insert, delete and update. When you perform any of these operations on an indexed field, then you have to perform the same operation on index tree as well which takes time. Indexes are stored in RAM so creating irrelevant indexes can eat up your RAM space, and slow down your server.

Viewing all 286 articles
Browse latest View live