What To Do When You F Up Big Time

This post will probably strike you as either common sense or absolute crazy talk.  It is especially written for those in the latter group.

I write a lot about working safely.  After lots of posts on branching, test environments, kitchen analogies, etc. I'm here to recommend some behaviors for those times when you totally screw up.  After all, you may very likely find yourself in an environment without all of the safety nets you want because you were specifically brought in to build the safety nets.  I'm going to assume that you messed up while doing the right thing in the wrong way rather than something criminally stupid like, say, encoding your DVD collection to Divx on the production database server because "it has those really fast drives and all that RAM".

First, and foremost, as soon as you realize that you've screwed up, let someone know.  Do not be tempted to keep things quiet and fix it before anyone notices.  I have yet to see a production issue that didn't get worse with time (and quickly).  Keeping things quiet is outright selfish because you're putting your own comfort ahead of the good of the group.

Secondly, fixing your mistake needs to become your top priority.  Fixing means not only getting things working again, but getting them back to the way they would have been.  Does data need to be re-keyed?  It's now your job to re-key it.  Do numbers need to be verified?  If you're not the one who can do it, be prepared to generate special reports or data dumps to make the job as easy as possible.

Next, take responsibility for your mistakes.  Full responsibility.  You don't get to say, "I deleted the production website, but the slow restore process is what caused the outage to be so long."  Being up the creek without a paddle means that you own the upstream and downstream problems as well.

After things are back to normal, do your own private After Action Review (note: there's a good chance you'll be asked for either a public one or one with your manager).  Take this opportunity to learn from what just happened while it's still fresh.  For a big enough mistake, you'll probably also reflect on it for a day or two.  Having said that, hear me now and believe me later: do not utter the words, "Well it's kind of lucky it happened because…".  Even if there's some fantastically beneficial outcome, you don't get to celebrate the effect, you are still responsible for the action.

Lastly, get over it.  If you've made the kind of mistake that I'm writing about, it will almost certainly affect you emotionally, mentally, and physically.  That's to be understood and will actually help with internalizing the "Don't do that again" lesson.  But don't let it affect you too much for too long or you'll kill your productivity.  If making a huge mistake makes you skittish to the point that you are no longer a high performing contributor, then things aren't back to normal are they?

As a final thought, while things are at their worst you may start wondering, "Are they going to fire me for this?"  I can't answer for certain, but I can tell you this:  When I had headcount, I never fired someone for making a mistake . . . and they pulled some doozies.  If you do get fired for a blunder that you feel comfortable defending (i.e. Doing the right thing the wrong way), then chances are it wasn't the place for you anyway.  The only way you can do truly incredible work is by being willing to take some risks and if your employer squashes any chance of that happening by firing people for mistakes, then you're better off elsewhere.  Just don't make a habit of it: it's easy to explain a one-off, but the second time you get fired for f'ing up big time it starts to look like a trend.

Brownfield Development: How To Peel The Onion Without The Tears

In the Spring of 2008, I decided to become a brownfield development specialist.  Greenfield development is when you're starting a project from scratch and get to design everything with only minimal constraints.  Brownfield development is the opposite of that: it's when a project is n months into development and most of the constraints have been cast in stone.  My guess is that if you were to ask every developer you know whether they'd prefer to work on a greenfield project or a brownfield project that they'd all say greenfield.  Heck, I'd bet that a quarter of them would laugh so hard that Red Bull shot out of their nose just at being asked the question.  Therein lies one of the secrets of becoming a Big Swinging Developer:

You can make money being good at things that other people hate to do.  A lot of money.

When people hate to do something, they don't do it very much.  Since they don't do much of it, they never get particularly good at it.  This makes it harder for them to do it and the cycle starts all over again.  Some common values of "it" for development include: writing tests, documentation, and creating installers.  You can safely add brownfield development to the list since a job description of "Support the big ball of mess that runs our business while adding features and fixing bugs" doesn't usually have folks banging down your door. Throw in the fact that you'll have no relevant documentation and few, if any, tests to guide your way and you can picture weeks of pestering your co-workers with questions just to get to the point where you can fix a simple defect.

But what if brownfield was easy for you? What if, rather than pestering your co-workers with basic questions, you could understand what the code was doing and then ask why it was doing things that way rather than asking what it actually does?  That's what my secret weapon, Visustin, gives you.  You paste in your source code and it'll generate a flowchart for you:

Visustin-fullshot

It supports 31 languages including the popular .NET, open source, and SQL variants.  I spent last week throwing hundreds of lines of Python into the tool and tracing through an incredibly complex financial trading system to learn how portfolio valuation is calculated.  I was able to correctly describe the process to my team lead, including a couple gotchas buried in the code even though I don't know Python.  Since I know how the system works, though, I can find the path the code will take and identify where problems are likely to occur while coming up to speed on the language at night.

If you're a developer, I'd highly recommend Visustin.  It's great for code reviews, documentation, and for diving into existing systems.  Developer or not, look around your industry and find the important things that no one wants to do because there's a real opportunity there.  You can become better (or more tolerant) than anyone else simply by identifying the key aspects to the unpleasantness and solving that problem first.

First-Rate Companies Have First-Rate Systems

This may surprise you, but when I say "First-Rate Systems", I'm not
talking about computers.  When you think about it, a company is simply
"the way a group of people does things".  The restaurant down the
street is just a group of people making burgers and hot dogs in a
delicious and unusual way.  Apple is a group of people designing,
building, pricing, and selling their wares.
    Sometimes these
systems are well defined in manuals, training, and software.  Sometimes
the systems are ad-hoc for better or worse.  As consumers, we love to
complain about bad systems which usually manifest themselves as the
combination of a stupid policy and an inflexible customer service rep. 
We also love great systems, but we rarely notice them as such. 
Instead, we like shopping at certain stores or eating at certain
restaurants or using certain services.
    If you're starting a
company, you need to give serious thought to your systems.  You can
typically get away with a blanket statement of, "Make the customer
happy" and then refine over time.  If you're joining a company,
experience your new employer as a customer while you're still new and
systems failures will stand out . . . hopefully so that you can fix
them.
    The above may sound obvious and/or a regurgitation of the
standard "Be outstanding!" marketing advice you see spouted constantly,
but it's all a lead up to the main point of this post:

When
you find a second-rate system that the company won't fix, you know that
you're not dealing with a first-rate company and you need to stop
treating them as such.

Not everyone has the fortitude to be
great.  Arguably, most settle for "good enough".  That's why you end up
with the dry cleaner that smashes your buttons and doesn't notice
before you do.  That's why you have the recruiting company that
requires you to enter time in the client's accounting system and
their own system even though they literally employee 1,000 people who
are qualified to build the integration between the two.  That's why you
have Comcast.
    As you might imagine, I'm particularly attuned to
(and offended by) bad systems since I spend my days getting paid well
to help clients build software to support great systems.  These systems
are typically great before the software ever exists, they're simply
executed through brute force and the client wants to replace that human
effort with software so that people can get back to work.  The
important aspect is the commitment for the system to be great – that's
what leads to a first-rate company.  You'll save yourself a lot of
headaches if you learn to identify second-rate companies and accept
them as such (or move on to a first-rate replacement) and if you see
yourself as first-rate then make sure your company is as well.