Solving Single Points of Failure

This ran on Medium on July 16, 2018. I thought about it recently.

Once, I was involved in an effort to break some bad habits. Among them, there had developed a cultural reliance on people who had become single points-of-failure.

I say a “cultural reliance,” because one guy who had emerged as the single-point-of-failure of some seriously critical things — the one person on whom all deployments and all production changes depended — had been hired to replace the last guy who had emerged as the single-point-of-failure — the one guy on whom all deployments and all production changes depended.

Single-points-of-failure are often well-intended people who, through individual actions to “fix things,” taken over time, have become so central as to be irreplaceable to certain processes.

This creates problems.

If your single-point-of-failure gets sick one Tuesday, or goes on vacation, everything stops until they return. If they get hit by a bus or eaten by snakes, everything just stops.

Chris Swan immediately pointed out that I had my very own Brent. Brent is the guy who emerges as the single-point-of-failure in, The Phoenix Project — that classic of IT DevOps by Gene Kim, Kevin Behr and George Spafford. And like Brent, our guy was not a bad guy; he wanted what was best for the company.

Once they recognize they have a single-point-of-failure, many managers are stuck. There’s a natural desire to try and negotiate out of the problem, which is an understandable impulse —the reasoning is that, surely we would all be better if the single point of failure is broken, and we would expect that the person who has become this point-of-failure would be the most appreciative of getting some much needed relief.

Interestingly, and counter-intuitively, I have never found a person who had become a single-point-of-failure who was happy to hear he would have less work to do. The manager says, “Let’s make it so that you are not the single-point-of-failure; I mean, what if you get hit by a bus?” This is almost always misinterpreted as some kind of rebuke of his work, not the situation that got him there in the first place.

Unfortunately, the most commonly successful (this does not mean easy) tactic here is to get rid of the single point of failure — take them out of the process. This doesn’t always mean asking for their resignation or firing the person (it can be a reassignment), but often it does.

Now, if your single point of failure is just a great-big-doody-head, it’s relatively simple (that is not to say it is “easy” or “not scary”) to rip and replace. But if the person is actually struggling to do the right thing in an unappreciative or ungrateful organization, “fixing this” means that the person who is a single point of failure will almost by definition feel marginalized and unappreciated.

They read action not as an indictment of the operations management that led to someone to feel it a requirement to just stand up and do … everything. Instead, it’s taken as an accusation that the person has somehow engineered the situation that way.

Often, that couldn’t be further from the truth.

Most of the time, the reason we get single points of failure are short-sighted organizational imperatives — things that happen well above the pay-grade of the single-point-of-failure. Take my example: you don’t get two different people, working at the same place at different times throughout several years, emerging as single-point-of-failure without a business decisions at the top that create a climate, at the least, in which managers don’t notice it happening until it’s too late.

One thing most managers and executives learn is that the person they feel they can’t live without — the person they are most afraid will walk, because we can’t do x without them?

You’ll be just fine when they are gone.

It actually doesn’t matter whether, in your single-point-of-failure, you’re dealing with a jerk, or someone who has just found themselves in a terrible situation: the solution in both cases is to rip off the Band-Aid.

Some mistakenly refer to those people as “dead weight”. This is incorrect: dead weight acts as a barrier to forward progression, but what we are discussing here is people who (whether it is their intent) are effectively fighting to drag us away from progress.

Breaking that single-point-of-failure and democratizing the process will almost certainly lead to hurt feelings, and often, to some people quitting.

I do not mean to be unkind or insensitive, but:

Good.

The company is better — and will survive and thrive — without these people. Often, you’ll be better off.

The people who stay are better without them and, perversely, they will accomplish more without this kind of negative influence.

This solves your tactical issue. Whether your organization learns the strategic lessons, and fixes the operations management issues that led to the problems in the first place, is entirely another matter.