Welcome to the second instalment in Data#3’s five-part series focusing on the detail outlined in our Azure blog, covering the five top tips for long-term success for your Microsoft Azure-hosted infrastructure. Data#3 has delivered over 100 Azure Health checks for customers, and consistently encounters the same problems.
This instalment is all about the Azure Advisor service, and how you can optimise your Azure consumption. We’ll cover both technical and organisational benefits for leveraging advisor and cost management.
Despite what you may think, Microsoft has capabilities within Azure to help you save money running your Azure workloads.
Enter the Azure Advisor service, a free built-in service that proactively analyses your workload and provides recommendations for high availability, security performance, cost reporting and optimisation. We will dive deeper into the sub-sections of Advisor capability to showcase what this service can do for you. But first, let’s dig into the shared responsibility model.
This is extremely important to keep in mind. See below for a summary graphic of the areas of responsibility for both Microsoft and your organisation.. Essentially, if you are in an on-premises world, the responsibility is yours, to ensure your infrastructure is protected, secured, available and accessible. As we shift to the right and into Azure, the responsibility starts to move to Microsoft, and you reap the rewards of not having to worry about the mundane. I think we can all appreciate this quote from a Microsoft employee, summarising this:
“If you shift your workload into Azure you will never receive a Pager/SMS/Email notification late at night for a Disk/San/Tape/Hardware failure” – Ben Armstrong, Microsoft
With this in mind, Microsoft has created the Advisor service to help you with potential misconfiguration within your responsibility.
Why is this important? If you deploy a single VM into Azure it’s up to you to ensure that the configuration of the VM is optimal. Patching, monitoring, virus protection, security hardening and baselining, availability, public connectivity, traffic filtering, installed applications and access all fall within your responsibility. I have often encountered situations where a customer has deployed something into Azure and when asked how they were protecting it, answered: “Azure!”
Our first section of capability High Availability, which helps you understand where your infrastructure may be vulnerable to service outages due to your current configuration. Public cloud infrastructure is complex and challenging from a provider perspective. When you deploy something into Azure, you need to ensure that cloud infrastructure is available. Your thing is going to run on hardware, abstracted by a Hypervisor. Hardware can often fail! While Azure has been built to tolerate failing hardware and is extremely resilient to outages from a physical hosting perspective, depending on your configuration may be susceptible to a service outage as your thing is evacuated onto healthy hardware.
For most use cases, this may be acceptable, especially for dev and test environments. However, what about that production system that is mission critical? If we were dealing with a single VM, then we would architect a HA FT solution with multiple VMs residing in an Availability Set or Availability Zone to ensure that at least 1 VM is available should the underlying physical host fail.
For Web services, we might leverage the Front Door or Traffic Manager service to ensure site availability between 2 or more App services.
Advisor helps with these situations by analysing your deployed workload, and determines where you may be exposed from an availability perspective. The below criteria make up the current rule set for these checks:
Each of these rules are evaluated against what you have deployed. The Advisor service will notify you when your workload may be at risk and present it in this report.
Drilling into one of these alerts provides more detail, with the ability to postpone or dismiss a particular item, and more importantly, offers a shortcut to remediate the issue.
In this example, selecting the “Enable virtual machine backup” option takes me to a quick configuration page to enable backup for this workload. How handy is that?!
It’s also worth noting that these recommendations exist within the context of the thing also. If I navigate to the VM in question I can view all recommendations for the thing scoped to just this thing.
Now, we move to the next Advisor capability: Performance.
Performance comes down to more than just CPU, Memory, IOPS or Network. We need to account for all metrics and configuration items that could lead to a detriment in end user experience. The Performance checks are comprised of the following rules:
Notice something missing from the above?
Not a single Performance rule is related to your Virtual Machine workload! More on that in the Cost section.
Now for the next section and arguably the most fun.
Now we get into the good stuff, and a critical component of the shared responsibility model. Here’s an example of what I mean.
For example, you deployed a VM and you need to Remote into it. You assign a public IP and open the default port for RDP access 3389. You’re in a hurry and don’t know your outbound IP address offhand, so you allow Any IP to connect via RDP. You’re not worried about bad third-party access as your local admin password on the machine is super secure- plus, who is going to guess the public IP address of your VM? Only you know that, right?
The countdown begins the second that the VM boots up to the logon screen. This countdown is the time to breach!
So, you have a task list to work through:
This is a basic task list and some components take time, so you may switch to another daily task while you wait for these to complete. Let’s assume it takes about 5 hours to complete this list and you are happy that your VM is ready for production release.
You could be very wrong with that assumption. So, let’s back up a little and go back to that Public IP that you thought was super-secret and secure.
I can absolutely guarantee that your IP address will fall into a range in this list.
I make that statement with total confidence, as this publicly downloadable list of Microsoft-owned IP ranges contains every possible IP that could be allocated to your VM. This list is available by design so that your Next Gen firewalls can dynamically update outbound firewall rules for destination Azure and other Microsoft online services.
If I was a villain, what do you think I would do with this list? Step one, I would build or rent a botnet to scrape through the list and iterate through every possible IP, looking for extremely high-value default port connections.
I would absolutely start with 3389.
So, your VM has just been built and my botnet just happened to be scanning in the IP range that your VM resides in. I get a hit; I now have a public IP that responds to a protocol query. That IP then goes into my second stage botnet that fingerprints the destination. This uncovers a vulnerability that has not been patched yet, automation then queues up execution of a vulnerability, and I am notified that a host is available to log into.
And with that, I now own your VM!
This could have happened at any stage after the VM booted up and, yes, would have been detected or mitigated at some stage, but what if you were called away straight after provisioning the VM? How much time could I have had to remote in, perform sideways reconnaissance and started attacking other resources? What about the thousands of legitimately bad actors out there who are conducting this exact scenario thousands of times per day?
So, going back to the shared responsibility model: whose fault was this? Microsoft or yours?
Hopefully you will understand the answer by now. So, where does Advisor fit into this?
The built-in rules for Advisor are as follows:
App Service recommendations
Compute and app recommendations
Virtual machine scale set recommendations
Data and storage recommendations
Identity and access recommendations
Technically these rules are native to Azure Security Centre (stay tuned for a future Masterclass on that), but Advisor integrates with the alerts from Security Centre and presents them here with the other reports.
In my example, multiple rules would have been fired to let me know that my VM was at a high risk, and I could have paused and started with secure access controls at the outset.
So, by now we have followed all of the Advisor recommendations and all our things are highly available, secure and performant. Now we go back to an initial statement in this blog: Microsoft wants to help you save money.
Now we move into the interesting advisories, the current rule set is below:
The two most common rules are Reserved Instances (RI) and rightsizing VM workloads. RI allows you to save up to 80% of the consumption cost for VM’s and other services (if applied with Azure Hybrid Use Benefit).
Right-Sizing workloads is just as important. Let’s assume you migrated a VM from on-premises to Azure. It had 32 cores at source and you matched the compute allocation within Azure. While on-premises you did not really need to pay per core, so you allocated what you felt was appropriate. Now, post-migration, you’ve discovered that your VM only uses 5% of its allocated CPU. Advisor will alert you that over a 30-day period that particular VM has not reached past 5% consumption, and you have the option of re-evaluating the compute allocation for the VM to select a more appropriate VM size to match the workload.
Just how much money you could save depends on what you have deployed. Out of over a hundred Azure Health Checks we have performed, our largest potential yearly savings value to date is $956,889 AUD- for a single customer!
One of the key capabilities of Advisor is that you can schedule reports and create alert rules to notify key personnel who are responsible for certain aspects of Azure.
To start automating and alerting with Azure, navigate to the Advisor service within the Azure portal.
Select Recommendation Digest.
From here, we can select which subscription to report on, frequency of reports, category of Azure Advisor reports (High Availability, Performance, Security and Cost) and the action group(s) for notification.
The action groups is where things get interesting.
Notice there is a lot more there than just straight email notification. Direct ITSM support along with Functions and Logic App support not only allow you to integrate with existing service desk systems, but we could also invoke self-healing actions.
Even just using a basic email reporting function will help with the optimal usage of Azure- this is especially true with large team deployments. Visibility is key, so setting up alerts and digest reports allows you to maintain control over Azure. We will cover an extension to this, to help prevent misconfigured resources from being provisioned in a future masterclass.
More on action groups.
And that’s a wrap for this masterclass! Start exploring Azure Advisor today, and always keep the Shared Responsibility Model in mind.
Stay tuned for the next episode, which will take you through the compelling advantages of Azure Security. Find more resources in the Data#3 Knowledge Centre.
Need assistance with Azure Remediation tasks? Contact a Data#3 Azure specialist today.