This post will probably strike you as either common sense or absolute crazy talk. It is especially written for those in the latter group.
I write a lot about working safely. After lots of posts on branching, test environments, kitchen analogies, etc. I'm here to recommend some behaviors for those times when you totally screw up. After all, you may very likely find yourself in an environment without all of the safety nets you want because you were specifically brought in to build the safety nets. I'm going to assume that you messed up while doing the right thing in the wrong way rather than something criminally stupid like, say, encoding your DVD collection to Divx on the production database server because "it has those really fast drives and all that RAM".
First, and foremost, as soon as you realize that you've screwed up, let someone know. Do not be tempted to keep things quiet and fix it before anyone notices. I have yet to see a production issue that didn't get worse with time (and quickly). Keeping things quiet is outright selfish because you're putting your own comfort ahead of the good of the group.
Secondly, fixing your mistake needs to become your top priority. Fixing means not only getting things working again, but getting them back to the way they would have been. Does data need to be re-keyed? It's now your job to re-key it. Do numbers need to be verified? If you're not the one who can do it, be prepared to generate special reports or data dumps to make the job as easy as possible.
Next, take responsibility for your mistakes. Full responsibility. You don't get to say, "I deleted the production website, but the slow restore process is what caused the outage to be so long." Being up the creek without a paddle means that you own the upstream and downstream problems as well.
After things are back to normal, do your own private After Action Review (note: there's a good chance you'll be asked for either a public one or one with your manager). Take this opportunity to learn from what just happened while it's still fresh. For a big enough mistake, you'll probably also reflect on it for a day or two. Having said that, hear me now and believe me later: do not utter the words, "Well it's kind of lucky it happened because…". Even if there's some fantastically beneficial outcome, you don't get to celebrate the effect, you are still responsible for the action.
Lastly, get over it. If you've made the kind of mistake that I'm writing about, it will almost certainly affect you emotionally, mentally, and physically. That's to be understood and will actually help with internalizing the "Don't do that again" lesson. But don't let it affect you too much for too long or you'll kill your productivity. If making a huge mistake makes you skittish to the point that you are no longer a high performing contributor, then things aren't back to normal are they?
As a final thought, while things are at their worst you may start wondering, "Are they going to fire me for this?" I can't answer for certain, but I can tell you this: When I had headcount, I never fired someone for making a mistake . . . and they pulled some doozies. If you do get fired for a blunder that you feel comfortable defending (i.e. Doing the right thing the wrong way), then chances are it wasn't the place for you anyway. The only way you can do truly incredible work is by being willing to take some risks and if your employer squashes any chance of that happening by firing people for mistakes, then you're better off elsewhere. Just don't make a habit of it: it's easy to explain a one-off, but the second time you get fired for f'ing up big time it starts to look like a trend.
Great article. I have one planned where I discuss how users try to hide their mistakes from IT people. This, of course, makes fixing the problem much more difficult.
Thanks for the feedback! I'll keep an eye out for your article. I really liked the Encyclopedia Brown post.
sigh… i miss 'Jay-isms'.. hope you are doing well