Operation Bootstrap

Web Operations, Culture, Security & Startups.

Availability Is a Feature... Like Anything Else

| Comments

“Availability is the most important thing…”

I heard this recently and I cannot agree with it. It sounds good, until you think about all the things that become less important things when you say something like that. Availability, like any other ability, is a feature of the service. It’s an important one – but by itself holds relatively little value. Consider that if availability were in fact the most important thing, you should never make any changes. We all know that availability increases pretty significantly when we stop deploying new software & stop trying to “improve” things. Left to sit, most systems will run for some period of time with pretty high availability assuming they were designed to handle some degree of hardware faults.

… and then in a few months, you are out of business.

The reality is that availability by itself isn’t a very valuable thing, it’s a feature like anything else. Think of the extreme example of an application that just says “Hello World”. I could produce an application that does this and has extremely high availability. I could invest in load balancers, redundant web servers across multiple availability zones and regions. I could probably design a Hello World app to survive all but the most extreme natural or man made disasters.

… and nobody would care.

Availability has to be combined with other features to make a valuable product. The more features your system has, the more users your system has, the more valuable that availability feature becomes – but it’s still just a feature. Like all features, it has to provide business value to warrant investing in and it’s business value is typically that it augments the value of the other features in the system. It’s like a power up that amplifies the impact of other weapons but on its own it really doesn’t mean much. As such, it cannot possibly be the most important thing with one or two exceptions – see below for those.

A tattered example

Take Twitter as a worn out example, they have had their fare share of availability problems in the past – everyone (who cares) knows. They are still around and arguably quite successful despite not really sorting out the revenue thing very quickly. Assuming availability is the most important thing, Twitter would have been out of business a long time ago. I use them as an example because they were a poster child for availability problems and they continue to struggle with scaling one of the largest networks out there. They will always struggle with this, it will never go away.

However, had Twitter said that availability was more important than building an API for 3rd party apps, would they even have experienced the availability problems they had? Had twitter chosen to focus on availability instead of implementing things like lists or search, would availability even matter? The reason people care about the availability of twitter is because it provides a function that they rely upon. Once you rely upon something, you want it to work, and availability becomes important – but it cannot be more important than building that “usefulness” of a product that makes people love it.

Also consider some other tradeoffs. A company I worked for at one point said that the most important thing for them was data integrity. They would prefer to take the system offline than have it up and risk data corruption or data loss. That’s a legitimate tradeoff. Over time you should be pressing to be able to protect & maintain data integrity while maintaining high availability, but if you cannot do that today then availability may not be the highest priority for you.

That does not make it unimportant – it just balances it against other requirements.

This is true… except when it isn’t.

Competition has a way of changing things, and this is one case where it does. Given two services that provide the same basic function, availability can become the differentiator. Look at Google when it first launched. What reason would people have to select one search engine over another? Altavista provided results but was slow & often provided irrelevant results – you had to dig. Google was fast and provided results that didn’t require you to dig. Google saved time & frustration and eventually became the verb we use as a synonym for “search”. Google was always there, it was available & accessible.

Lack of availability will put you out of business if someone can copy your features & then improve on your availability. Think about the reason people choose reliable automobile brands – because at the end of the day it matters a lot whether it starts up in the morning or not. It matters a lot how much time your car spends in the shop. It matters a lot how much you get to use that car – which is directly the result of availability. It enhances all those features.

Infact people will choose less features to get better availability in some cases – think Toyota. However, the car still has to serve some basic requirements  to even be considered as an option. These features aren’t availability features, there’s an expectation that availability is just there and works.

Another exception is when lack of availability in a system completely invalidates its reason for existing in the first place. An example of this are life safety systems or life support systems. If a failure of the system intended to preserve life causes loss of life, then loss of avaiability has invalidated the existence of any of the life saving features of that system. If that one feature, availability, can permanently invalidate all the other features then it is the most important thing. It is the platform on which all other features are built. If you work on systems like this then the equation is a bit different.

How do you work it in?

It’s easy to cry about availability when you are on an Ops team being pummeled day in and day out with availability problems. I know, I’ve been there. But availability problems are typically more systemic and have to be addressed in every component you build, every new feature that is added, and every new service you deliver. It’s a core feature of everything you do – but it is not the most important feature in most cases.

Building in availability means thinking about how things fail & planning for that but it also means staying in business to have a problem to solve. It’s an important feature and believe me, most businesses cannot scale without investing significant effort into improved availability. But to say it’s the most important feature is to say that all other things come second and this is very rarely the case in the world of web operations. If you are in a business where you can afford to make that statement then I would say you have a very mature system and a strong business – because anything less wouldn’t afford it.