So here’s a summary of the festivities around here over the past 4-5 days when one of our main transformers went “into the weeds.” There will most likely be an article in the local school newspapers about it, and perhaps even the Boston Globe:
- Thursday 2/2/06 5:30pm: Datacenter operators for the evening notice a "burning odor" from one of the transformers feeding our machine room. It was clearly in the early stages of losing all of it’s magic smoke. And it had a lot of magic smoke.
- 8:30pm: High-Voltage electrician arrives on site, with proper nomex protective gear, to work on the problem.
- 9:00pm: It is determined that the entire datacenter needs to get powered-off.
- 9:00pm – 10:00pm: All servers powered down as gracefully as possible
- 11:00pm: Power restored
- 11:00pm – 3:00am: Most of the crap that broke from being shut down for the first time in years is fixed and running. Just in time for us to learn that we’re going to need to come in on Saturday for another forced outage!
Anyway, a good time was had by all, and the adventure continues to this very day.
The casualties from Thursday night’s debacle included:
- One transformer, (although that’s facilities’ problem, not mine)
- Three Sun V490 power supplies
- One Sun V440 motherboard
- Four Compaq 9.1GB disk drives from a 900 year old RAID
- One entire 900 year old SWXCR direct-attached RAID, which happened to include a critical server’s operating system in it’s entirety
Needless to say, it was the last of these items that caused us the most grief. Luckily, it was decided at nearly 3:00am that it could wait until morning to revive the old beast, which I successfully did. And, wouldn’t you know it, the same exact RAID died again during Saturday’s outage, along with a few other servers that should all be fixed by tomorrow.
So the long and short of it is that it’s probably not a good idea to shut down things that have been running for 2+ years — especially with spinning disks from the Clinton administration. I’ll post a link to the newspaper article if they decide to make news out of this, since a lot of important administrative services were down.