Hybrid Cloud Cookbook: DevOps is Not Just Platform 3 Hype
I often get from customers that they are not Platform 3 companies which build fancy consumer facing mobile applications, and thus there is no need for continuous delivery and other DevOps promises. They are absolutely right, and at the same time they are ignoring the best things that have hit the IT industry in a long time.
If I look back at my 20 years in this industry, I cannot find a single IT project where we couldn’t have substantially benefited from DevOps, even when I was doing PL1 programming on mainframe. What confuses people is that we talk about continuous deployment as the goal—but this is really only a side effect of running professional IT—and that is what DevOps is all about. In other words, going from individual-based artisan work to team effort where basic processes and platforms are well-defined and maintained.
There is a fundamental approach issue in traditional application development projects. We tend to believe that testing and deployments are phases after development work has been finalized, and there is no need to worry about those before we get to that phase. This leads to the two major issues. First, we run into more quality issues and, second, our time to deploy new releases becomes much harder and longer than anticipated, which causes the project to be delayed. The sequential nature of traditional application development, with its complex processes, leads to a highly stressful, compressed timeline to resolve issues as they are found and this pressure only increases with each failed round of testing
I have numerous real life experiences of projects that were considered to be relatively on time and on budget at the end of the development phase, but instead turned into massive problem projects once we entered the testing phase. Most of these issues would have been avoided, or at least mitigated better if we had followed the principles of agility (do small iterations and fail early) and DevOps (design project with testing and deployment in mind).
So what would have been different? First, if we had defined test cases against every functional requirement already in the design phase, we would have noticed that many of the requirements were vague and ubiquitous and there was no common set of success criteria between users and developers. But in traditional model we discovered that only once we had done the coding work, leading to costly and time consuming changes. Properly defined test cases force everybody to agree on success criteria and it flushes out all poorly defined ubiquitous requirements.
Secondly, we always underestimate both the effort needed to setup and maintain different environments (development and 2-3 test and production environments) and the effort to move (deploy) the applications to an environment. Why is that? The issues related to coding and functionality are relatively easy to uncover, but many of the issues are caused by the environment (configuration issues, bugs in the underlying software like drivers and servers, etc.) These tend to be the difficult ones to figure out. These issues usually start popping up only once you put more load on the applications, which leads to the symptoms appearing in different places. It’s like the human body – your finger is numb, but the cause of the problem is your neck that is too stiff. Once you finally find the root cause, you need to remember to apply the fix to all environments and test that they all still work after the fix. This is a simple task that is surprisingly omitted both because of the many other things needing attention and because of the many times different environments are maintained by different groups (development team handles development and part of the test environments, while the production team handles rest).
The last piece is the difficulty of deployment. You need to deploy your application code and configurations in the exact right sequence with the right steps performed for each task, otherwise there is a good chance of the application not working. Many times this resembles voodoo more than the 21st century high tech as one single step done in the wrong order can lead to mysterious issues without anybody being able to explain why that is. Naturally, most of these are not documented features and the correct sequence is found by the trial and error method. The larger your team is and the more components the application has, the more complex the deployment gets. This all leads to human error as the people are trying to do things under enormous time pressure.
Once you are in the depth of the negative cycle, it’s very difficult to pull the emergency brake and call for a month break to do the work that should have been done in the design phase. You need to plan how you maintain the environments and have the change processes in place. You also need to design your application architecture not only from a run-time perspective, but also from a deployment perspective. This begs the following question: how do you break up the environment into smaller pieces which are less complex to deploy? The answer to this is that one must define the roles and responsibilities in the deployment process and automate as much as possible in order to avoid human error.
That is what DevOps is all about—it is about team work, constantly improving your processes, finding bottlenecks, improving the efficiency of your team’s throughput and designing for success. Think of continuous delivery as the byproduct of a well implemented DevOps environment in your organization. Believe me, during the final days before any development project is scheduled to go live, any organization would do almost anything to have the ability to deploy successfully 10 times per day.