Webster’s defines “availability” as the time in which a piece of equipment (such as a computer) is functioning or able to function. In terms of a company like
Zen Planner, which delivers software as a service, our availability goal is 99.99%. In terms of minutes per year, that means we are striving to have no outages longer than 52.56 minutes in total for the whole year, outside of planned maintenance.
One of the mantras at Zen Planner is to always put the customer first. This means that the applications we create must meet and/or exceed the customer’s (your) needs. In order to make this happen, the infrastructure the applications run on MUST be highly available and run on hardware that will let them perform.
Single Points of Failure are BAD
One of the first things you look at when designing a highly available environment is to eliminate single points of failure. These are exactly what they sound like, places where failure of a single component will cause the application to be down or not responsive. We have taken great care to design Zen Planner’s infrastructure so that the environment is not only redundant but also N+1 (I know, super fancy technical term). The idea of N+1 is that if a piece of the environment has an issue that the other components can pick up its work and still provide application performance, which at the end of the day is the customer experience.
Who Watches the Watchers
Richard Feynman said, “Nature does not know what you are looking at, and she behaves the way she is going to behave whether you bother to take down the data
or not.” Having an infrastructure that is highly available is only part of the task. You also have to always be watching your environment.
At Zen Planner we monitor and measure every element of our infrastructure. This allows us to not only detect issues right away but also, and more importantly, plan ahead. When designing an infrastructure, we try to plan for growth. Sometimes our estimates work out as expected, sometimes they don’t and there is more load put on our systems than we expected. Good monitoring helps us see these trends and take action BEFORE there is an issue.
Sometimes Stuff Happens…
Coincidental with writing this blog post, Zen Planner experienced an outage. The root cause of the outage was a failure in execution on renewal of our security certificates; overall a pretty simple fix. However, this gave the infrastructure team not only the opportunity to practice our processes when “stuff ” happens, but also how we communicate when issues are occurring and how transparent we are at communicating what the failure was.
For us transparency is key to improvement. In this case we determined that we needed a more structured approach to these kind of renewals because our infrastructure had evolved enough that just “knowing” all the places we have to update is no longer sufficient. This kind of process improvement is key because it allows us the opportunity to become smarter, stronger and able to leap tall buildings… or at least tall stacks of data… in a single bound.
If you can design and build highly available infrastructure and processes, continuously improve and have fun doing it, then you are doing it right.
If you are interested in seeing Zen Planner in action, schedule a demo with one of our Software Pros today!