If you haven’t been following this series of posts, please familiarize yourself with the previous entries. Things will actually make more sense this way. Or you can just jump in.
Last time when we left off, my new friends Toby and Barry have finally managed to fix our firewall issue by replacing it with a brand new one, and accidentally destroying the old one. Unfortunately for me, all the rebooting caused my ISP to freak out and stop routing packets to one of the 5 consecutive public IP addresses we have been using. Barry and Toby packed their stuff, and cheerfully fucked off back to their office, while I was left with the task of yelling at Verizon until they fix the routing issue. Foolishly I thought it would be a relatively easy task.
It is the day after the big upgrade. I drag my sleep deprived carcass into the office, drop my bag at my desk and make a bee line for the coffee machine. Apparently my peripheral vision is not the best when I’m still half asleep because someone intercepts me before I manage to reach the blessed caffeine dispenser.
“Luke, the web timesheet sucks!”
Of course it sucks. So does the phone system. So does the online reviewing system. So does Microsoft office. Every single piece of software ever made sucks at one point or another. Every single software stack ever created is basically an unimaginably complex house of cards. And the fact hat it does not crash and corrupt data on regular basis, is a small miracle to be cherished. That’s just what software is: a bottomless vortex of suck. But despite being horrible and broken most of the time, it does make our lives better. And if you don’t buy that, then I dare you to live for a week using nothing but pen, paper and post office stamps.
But of course software sucking is automatically my fault, because my cubicle happens to be located in the area clearly labeled “IT Department”. Yes kids, when you take up a job in the IT field it is basically like signing a document that says “I hereby accept the blame for any and all software and hardware failures that may have happened to any electronic device with an asset tag – from now, till the heat death of the universe”. And no, I’m not kidding – you may leave the company, but they users will remember that your ass used to maintain that one server back in the day. And if, God forbid, that server is running 1% slower one day due to network congestion or some runaway OS level process, they will hold you personally responsible for their lost productivity.
The “web time sheet” that I somehow caused to suck this morning is a proprietary windows service with more memory leaks than Titanic after it’s fateful encounter with the iceberg. If you let it run unattended (like you are supposed to) then it will eventually run itself into the ground. So the server is scheduled to reboot itself automatically at like 4AM on Sundays or something like that. Unfortunately certain activities leak more memory than others, so you never really know when that server is going to go into swap-hell. Sometimes a mid week reboot is necessary.
Since it is physically on a rack in a data center in another building we have a VPN tunnel that lets us remote desktop into it for maintenance purposes. Usually rebooting it ahead of schedule is trivial, but this morning I hit a snag. The VPN tunnel seems to be down. Coincidentally, this is the one thing that we completely forgot to test the night before. The users can still access it directly because it has a web facing front end, but I can’t get to it to power it down.
So I call up Barry, but I’m going straight to voice mail. Since it is only a little past nine, I figure he is probably not in yet. So I leave a message, and on a lark call the main number for his office. I figure they may snag him as he walks in, and get him to call me back. Some perky young girl picks up, takes my name, number and promises to fetch Barry for me. Then she says:
“Please hold…” and after a very brief pause I can hear her yell out “Hey, Barry some guy Luke wants to talk to you about a fire thingy… He says it’s urgent”.
To which Barry responds: “Tell him I’m in a meeting.”
She dutifully explains that he is currently in a closed meeting and cannot answer the phone, but will call me back. I ask her to notify him that I can hear his voice quite clearly and suggest that she pencils Barry in in for another a high priority meeting sometime this morning: one between our mutual boss’ boot and Barry’s ass, if he does not give me a call as soon as his “meeting is over”.
Barry groans painfully in the background, blurts something out about learning to use the “hold” button and takes my call. I make him log in, and check the VPN related rules, but these seem to be in place. It’s just that the link is dead, and the issue seems to be on the other end. Which kinda makes sense – we essentially rebuilt the entire thing from scratch, so perhaps the other firewall got confused. So I leave Barry alone, since he and his team do not have jurisdiction over the data center in question.
Instead I call Charlie (no, not this Charlie – a different one). Of course, Charlie is working in the field that day, so instead I get to talk to one of his under-flunkies whose name I didn’t even bother to memorize, so I will just call him Beef. After a short conversation, in which Beef uses the word “brah” at least 8 times I realize I really, really, really don’t want him to touch the firewall rules. Instead I figure I will just have him reboot the web timesheet server, and then wait for Charlie to come back and troubleshoot the VPN issues with him.
When I explain his task to him, he sounds relieved and happy he won’t have to reconfigure anything. Rebooting seems straightforward enough. I give him the name, number, asset tag and all the other stuff and send him off. There is little to no cell phone reception in the heart of the data center where our machine is located, and they conveniently do not have cordless phones in their office so I can’t walk him through the entire process – which concerns me a bit. Then again, it should not be that difficult to just reboot a single machine.
About 15 minutes later, Beef calls me back:
“Bro… Sorry to call you, but like dude, do you know which rack it is in?”
So apparently Beef wandered out into the racks armed with only a sticky note with scribbled asset tag and serial number. He checked like 3 different racks, then realized there are like dozens more out there, got discouraged and called me back. I personally never visited that particular server, so I honestly have no clue where it lives physically. But I assume they have some sort of lockup system, or reference sheet which he can use to map the information I gave him to a rack number.
Some time passes, and Beef happily reports that he has found the correct rack but insists that none of the servers inside are labeled properly, and he wants to know which box is mine, counting from the top. That’s the sort of information I never actually needed before, so I’m no help. But since I’m fairly sure he is just too dense to figure it out, I suggest that maybe the servers are just racked in the order they are listed on his reference sheet. This seems to ring a bell, and he goes off to investigate.
Finally Beef calls me with good news:
“Bro, I couldn’t figure out which one it was, so I like rebooted all the machines in that rack. You should be good now, man!”
My head hits the desk so hard that it wakes the intern up from his morning nap. In an uncoordinated fight or flight reflex he attempts to stand straight up, painfully whacking his head on the overhead cubicle cabinet in the process. The sudden awakening, combined with an unexpected cranial collision makes him flail about like a beached fish, fall of the chair and land on the floor cradling his head.
My other coworker looks up from his screen and goes:
“Dude, if you kill the intern they will never going to give us another one.”
With my forehead still pressing against the desk, I hang up the office phone, extract my cell and send Charlie a quick text message:
“I think Beef rebooted every single server in the rack. I did not tell him to do that.”
His response is swift and succinct:
“MOTHER OF FUCk1” followed by “AGHS! FuKING FICKn SHIT BEEF!”
Apparently at least one of the machines in that rack is on the “do not reboot under any circumstances ever” list. Fortunately for me, none of these other servers is mine and Beef did temporarily resolve my memory leek issue. So even though I feel bad for Charlie, I write this one up as a success.
This leaves me with the more important task, namely getting Verizon to fix their routing issue. But I will talk about that next time.
In case you were wondering, Charlie managed to fix the issue relatively quickly and the users were only slightly angry. Beef was assigned a penance of self flagellation and temporarily banned from touching the racks. Of course he wasn’t fired, because like nepothism bro. Also, my users still totally think that I come to work at night, only to throttle down all the services, and sabotage their computers so that everything is slow and buggy in the mornings. I mean, that’s the only logical explanation.
Next on Firewall Saga, I deal with the brilliant Verizon on-site technicians.
|The Firewall Saga|
|<< Prev||Next >>|