You Wanna Be a Rock Star?
Welcome to my blog. I have dedicated my career to helping companies develop strategies to protect and improve the availability of their applications and data. What has excited me recently is Continuous Availability (CA), an IT strategy that enables companies to transform the way they handle availability in their data centers and with their applications. The companies I have worked with in implementing this approach have been able to reduce their server count by 28 to 40%. For example, one big-iron shop was able to take its SAP server count between production, high availability (HA), and disaster recovery (DR) from 352 servers to 250, and another saw its document management server count reduced from 48 to 28.
You might be asking yourself how companies have been able to achieve such dramatic efficiencies. The fact is, CA is opening a whole new world for improved availability and reduced costs for IT services through the use of fractional computing. For that reason, I would like to devote some time here to explaining fractional computing. Very soon, many savvy practitioners will be adding this strategy to their bag of tricks, and it’s important to stay on top of the state-of-the-art techniques that help you stay ahead of the pack.
To begin to understand fractional computing, you have to look at how we provision servers. If you look at a traditional method for provisioning a Tier-1 Database server, you would provision one server for the production need, one server for the high-availability (HA) backup, and one in disaster recovery (DR) for a total of three servers or 200% over need. The situation is a little better in web and app farms where you may only allocate 20 to 25% more. For example, if the need were five servers, you might have one extra server for HA, and five in DR for a total of 11 servers or 120% over the need.
These idle server/server images sitting in our production for HA and in our DR environments are a concern for nearly every IT shop. Why? The costs are enormous to maintain idle assets, server, licensing, power, rent, and administration. Also, the high cost of HA and DR negatively affects the service levels IT delivers. We tier our apps by importance, and deny HA and DR coverage to apps deemed not important. Consequently, if there is an outage in the lower tiers, restoration of service takes longer, and the reputation of IT suffers.
One thing that you may notice is that HA and DR offer a back-up capability for the same asset — one for a component loss, and one for a site loss. So what if you could combine your HA and DR disciplines into one solution? Combining the two is now technically possible with Active/Active architectures and 2-Site HA which enable a new service level IT can offer called Continuous Availability or CA.
With CA, you can effectively combine HA and DR into one solution and one discipline and reduce the cost of computing to 2x or 100 over need. In the case of a database server you could have one server in each of the active/active sites. In the case of web or app farms, following my earlier example, you would put 5 servers in one site, and 5 in the other. Savings indeed, but can you do better?
One interesting statistic we have seen is that overwhelmingly, most site outages are for a relatively short period of time – a few hours in most cases. In less than one percent of instances was a site outage long enough to declare a disaster. So given that data, do you need to provision 100% of need in both sites with an Active/Active application spread both? Well maybe in the case of a database server, but what about the case of web and application farms?
For web and app farms, what if you put 60% fraction of the need in each site? On a day-to-day basis that would mean 120% of need or 20% over allocation is available for processing. Could the average load be accommodated in one site if the other site is offline for a few hours? I say yes. The reason I would recommend putting a fraction of the need in each site is based on the nature of compute utilization. What I mean is that if you look at any individual server, there are several bands of use: idle, low, average, high, and over-capacity. The idle band is generally accepted as between 0 and 10%, low between 10 and 50%, average between 50 and 70%, high between 70 and 95%, and over-capacity is a demand between 95% and above. Capacity planning generally says that you want to allocate compute in the average band to accommodate the peaks.
And what are the peaks in processing caused by busy periods, end-of-accounting periods, batch runs, data warehouse loads, etc.? Many of these high-volume demand times are controllable. While not desirable as a regular practice, is it possible to delay off-peak processing for a few hours for improved availability?
I am running into many CIO’s who say yes. Their direction is to develop designs that put 60% of the web and app servers in each site, realizing that day-to-day they have 120% of need available. If a few servers fail, there is slack in the provisioning to accommodate the demand. If a site fails or is taken offline for maintenance, 60% of the compute is seamlessly available.
Fractional provisioning 60% of the server count level reduces server maintenance cost, administration cost, and licensing cost.
Now back to the database server — earlier, I proposed one server in each site. Let me refine that and say, one of each server image, but provisioned with fewer CPUs and memory to approach 60% of the need. While the maintenance and administration cost is the same, there is an opportunity to reduce CPU core based license costs.
So welcome to the world of fractional computing. Those who the embrace it and roll out a CA approach in their organizations will be the next generation of rock stars in IT!