Here’s a moment most IT teams would recognise. Something breaks, or worse, disappears, and everyone turns to you expecting a quick answer. You run through the architecture in your head and realise that while you have redundancy, backups and security controls in place, you are not entirely sure how quickly you could recover if things went seriously wrong.
Disaster recovery (DR) isn’t a new issue. It’s been around for decades and is generally well understood, but it often catches organisations out because it slips down the priority list. There’s always something more urgent, more visible or more attractive to invest in… until there isn’t.
This blog looks at why those assumptions break down in cloud and hybrid environments, and what disaster recovery needs to account for if it is going to work in practice. This is where technology partners like HPE, working alongside Data#3, focus on helping organisations design disaster recovery strategies that extend beyond assumptions of built‑in cloud resilience.
In recent years, most security discussions have centred around ransomware and cyber-attacks, which is understandable. The threat is real, its impacts are evident, and the board is aware of it. However, this focus has quietly pushed disaster recovery to the side, often seen as a separate issue from security or even something that can be postponed.
Looking at recent outage examples, many weren’t caused by attackers but by people. Errors, misconfigurations, accidental deletions, or systems acting unpredictably. One example we saw recently was an organisation that had a strong, well-tested disaster recovery setup when everything was on-prem. When they moved to the cloud, the assumption was that resilience came with it, but over time, their DR discipline faded. Then, a cloud account was deleted, and they were offline for days because the strategy had not evolved with the environment
Or another customer who had redundancy built into their existing cloud provider’s platform so everything appeared resilient from the outside. However, when their cloud provider experienced a major outage, they couldn’t access either their production or backup environments, leaving them locked out of the very systems they relied on for protection.
Then there’s the CrowdStrike incident, which was caused by a routine update that had unintended consequences across customer environments. Systems that were otherwise well-protected and fully operational were suddenly impacted at scale. It is a good reminder that even trusted platforms and established processes can introduce risk if an upstream supplier makes a mistake.
These are not just rare cases. They stem from a simple truth that, despite our experience, things will still go wrong, even if they’re not malicious, and not always in ways your current design considers.
A common pattern we see is organisations assuming that redundancy equals disaster recovery. If workloads are replicated within a cloud platform or spread across availability zones, that should be enough.It’s not.
Redundancy helps you survive component failure within a system, but disaster recovery is about what happens when the system itself becomes unavailable, inaccessible, or compromised. That distinction matters more now than it used to, as your environment is no longer a single data centre. It’s a mix of cloud services, on-prem infrastructure, networks, identity systems and third-party dependencies. The network, in its broader sense, is what ties all of this together. If your recovery strategy doesn’t account for how those pieces connect, you can end up in a situation where everything is technically still running, but you cannot reach it, fail over to it, or operate it. It shifts from a resilience problem to a design problem.
One of the more notable shifts is that disaster recovery has become more achievable than many teams realise. In the past, DR involved establishing a secondary site that sat unused most of the time, which was costly and seldom utilised outside of testing, making it hard to justify. That’s where much of the resistance still originates.
What has changed, however, is the network. Increased bandwidth, more adaptable connectivity and hybrid architectures mean you can now create environments where both sites are active. Instead of a primary and a passive backup, you have two production environments that support each other. This is often described as a “production one”, “production two” model, where both environments are live and carry workloads. However, if one fails, the other can still absorb the load.
It’s not free, and it still requires design effort, but it changes the conversation. You’re no longer paying for something you hope you never use; instead, you’re investing in capacity and flexibility that delivers value every day. However, you must understand how data, applications and users move across that network. Without that visibility and control, even the best infrastructure won’t deliver the outcomes you expect.
With everything IT teams are responsible for, DR can feel overwhelming. You start considering every application, dependency, and integration that needs to be managed, and it quickly becomes a large, complex project that competes with other priorities.
A better approach is to accept that you don’t need to solve everything at once, but instead start small. Choose a critical workload, understand how it functions and develop a recovery plan specifically for that one system so you can test and learn from it. This does three things:
It is tempting to view disaster recovery as mainly a storage or compute issue, but in reality, it’s also a network issue. Your capacity to replicate data, reroute traffic, verify user identities and access systems all relies on your network’s design. This includes not only the physical infrastructure but also how your environments are linked across cloud, on premises and external services.
This is where many DR strategies fail. The infrastructure might be in place, but the connectivity between environments has not been fully considered. For example, setting up a recovery environment in the cloud is relatively simple. Ensuring it is integrated into your existing network, with the correct routing, security controls and access paths, is where complexity begins to emerge.
In some cases, a colocation approach can simplify this, particularly if it provides more direct and predictable connectivity into cloud platforms. In others, a well-designed hybrid network can achieve the same outcome. The point is whether the network supports the recovery outcome you expect, not the options you choose.
There are many reasons why disaster recovery is often ignored, but the examples discussed show that the risks still exist. The environments we operate in today are more complex, more distributed and more reliant on external platforms than just a few years ago. This makes the network more crucial, not less.
Disaster recovery involves accepting that failure is a natural part of how systems operate and designing with that in mind. If this has been on your mind for a while, the best next step isn’t to launch a large program or overhaul your entire architecture. As we said earlier, it’s best to start smaller with a single application and test it.
That process will tell you more than any high-level plan and give you a clearer picture of where the gaps are and what needs attention next. This is also where platforms like HPE, combined with tools such as Zerto that offer capabilities like a testable DR plan that can be run during business hours, with a lot less pain, come into the conversation. Not as a silver bullet, but as part of a broader approach to building recovery across on-prem and cloud environments in a way that aligns with how your network is structured.
If you want a second opinion or a practical way to work through this without getting stuck in theory, Data#3 can help. We work with organisations to assess how disaster recovery will operate across on‑prem and cloud environments, with a strong focus on network design, access and recoverability. The aim is not to create a perfect plan on paper, but to make sure your recovery approach can be executed when it matters.
Speak to our team of HPE Specialists today
Information provided within this form will be handled in accordance with our privacy statement.