For many small and medium sized businesses, a technical disruption has the capability to bring the business to a complete halt. Entire organisations are now dependent on a complex array of applications, data and networks, efficiently working together to perform daily business tasks. Some of these organisations cannot afford even the slightest interruption to their IT infrastructure, yet they are often ill-prepared to recover when a disaster occurs.
In 2017, Veeam commissioned the Enterprise Strategy Group (ESG) to deliver its sixth annual Veeam Availability Report. One of the results discovered was that four out of five organisations identified as having an Availability Gap. What this means is 82% of respondents identified the inadequacy of their recovery capabilities when compared with the expectations of the business unit.
So, the question must be asked, how can organisations improve this Availability Gap?
We can start by carefully looking at the tools currently employed. Tools can include monitoring, scripts, batch files, checklists, documentation and data availability software. By checking that an organisation is using the right tool for the job, it will help make failovers much easier, and more importantly, safer to practice. Without the right tools, practising failover is often quite tricky and generally expensive, which means it is not utilised correctly very often.
Tools should also be checked to see if they can withstand disasters themselves, and documentation should be printed and stored in multiple locations so all relevant personnel know how to access and use the data availability software.
Regardless of which tools that may or may not exist in your organisation, time and time again it still comes down to practice makes perfect. There is no best practice or golden rule for disaster recovery other than to practice as often as possible. The more you practice recovering your applications and data, the more likely you are to succeed.
One idea for practising failover is to test a different application failover each week. Chances are you will need to iron out a lot of wrinkles and it will take a lot of work, but you need to just keep at it. You are likely to identify issues such as not being able to meet RTO and RPO targets or a stubborn application that won’t update their IPs.
However, the benefit is that application owners will start learning more about their applications and how they work in a failover. The point is not about making people unhappy but about realistic results, if your organisation is having trouble with a practice failover how is the real deal going to fare?
Practising frequently also helps minimise the risks of configuration drift. Imagine an environment where all the applications are configured as self-contained siloed virtual machines and they can be easily identified by the data availability solution so that it can protect them. As applications scale out and evolve, databases are centralised and load balancers are deployed.
We can further improve an organisation’s Availability Gap by simplifying the failover process. This can be accomplished by having applications already running on the recovery site, things like secondary Domain Controllers and DNS are excellent examples of these.
Certain applications can be clustered across different data centres reducing or even eliminating the need for disaster recovery in some cases.
By simplifying the failover process results disaster recovery events will be more likely to succeed.
It is also important that organisations and their personnel understand who can approve a failover during a disaster recovery event, and more importantly, who can actually trigger the failover. Organisations need to be flexible enough to not only organise and protect against large site disasters, but also small disasters, as most disaster recovery events are only partial.
A partial disaster recovery event is where only some applications and data have been lost or interrupted not all applications and data. This can be very challenging if the organisation’s disaster recovery planning only caters for failing an entire site over.
In today’s world of climate change and ransomware on the increase, reducing an organisation’s Availability Gap is more important than ever.
According to the University of Texas, 94% of companies suffering from a catastrophic data loss do not survive, 43% never reopen and 51% close within two years.
Don’t become a statistic and start reducing your Availability Gap now.