The thing about being an IT professional or a sysadmin is that your workload comes and goes in waves. Some days are just slow and lazy, and there is not much for you to do. You are all caught up on your current projects, all of which are pending review, waiting for approval or at a standstill. You are done with all regular maintenance tasks, and anything requiring serious work can’t be done during business hours anyway because it would require taking down a crucial server, or two. Most of your day is spent with silly paperwork, answering random calls from marketers trying to sell you enterprise business solutions, and browsing the web. On such days even the users relax a bit.
You can hear it in their typing. On normal days the clickety-clack of the keyboards is an angry, hate filled sound. They are not typing they are spitefully inflict punishments onto their machines in the form of text. Work is a struggle between man and the disobedient machine. But on the slow days, their touch becomes gentler. You could almost imagine that they don’t hate and fear their computers, but have grown to accept them as inanimate objects that are merely tools of their trade. This of course is merely wishful thinking, as you can still smell their fear and disdain in the air. But on those days when nothing breaks, there are no upgrades and everything is sailing smoothly they are lulled into brief and temporary state of comfort and relaxation.
Then, there are the other days. We call them “Fan Days” – days where the shit hits the fan hard, and everything breakable, decides to break at the same instant. This is a story about one of such days.
The first indication that you are about to have a fan day, is a curious convergence of vacation days, sick days and personal leave request among your coworkers. Nothing bad usually happens when your IT team is at full strength, and in good shape to swiftly respond to major issues. It is only when you are running with a bare bones skeleton crew when things start to crumble. On this fateful day, I was more or less flying solo using the intern as the storm shield against the wrath of the users calling the help desk.
The first issue of the day came early in the morning and concerned our Barracuda SSL VPN box. Granted we go get anywhere between 5 to 10 calls for that thing every day, but that’s just because around 10 of our users just can’t wrap their heads around the concept of two factor authentication. The help desk is sort of a fourth factor in their authentication process, walking them through the extremely difficult task of plugging in the USB dongle and typing a password into a login box. This was an all-together different pair of shoes – not a PEBKAC authentication problem, but an actual functionality issue.
The nice thing about using the Barracuda is that it allows us to give remote users access to our intranet web apps without actually exposing any of their servers to the internet. The communication is proxied by the SSL VPN box, encrypted and hidden behind a two factor authentication scheme. One of our users had a problem uploading a large zip archive to that very service, but the Barracuda proxy server would not have it. We have never noticed this in testing, because none of us reasonable IT folks even dared to assume someone would be silly enough to try uploading over a 100 MB of crap. Average size of the attached files uploaded through that form was 5-6MB so we collectively figured that setting the upload size limit to exactly 100MB would be more than enough. But, alas here was a user with a 115MB file that needed to be submitted by yesterday.
Fortunately, it was not ha huge issue. I simply logged into the admin panel and increased the file size limit, applied the changes and… Inadvertently broke the SSL VPN box somehow. It basically fell of the internet. One minute it was there, the next it was completely gone, not responding to any HTTP requests. I could still ping it, and nmap could see that it was listening on the usual ports, but there was no one answering when you knocked on the door.
I promptly fell out of my chair, and scrambled towards the server room to check whether or not the device bricked itself somehow. With shaking hands I fumbled for my access card, and proceeded to swipe it exactly four times, until I finally got the right alignment of magnetic stripe to the card reader. Then I dropped by server rack keys on the floor a few times, before I managed to claw my way to the silently humming boxes inside.
When I switched the KVM to the Barracuda box, the screen lit up and I saw the familiar login prompt, and the maintenance menus. Sudden rush of relief caused my lungs to exhale stale air, probably for the first time in the last 10 minutes. The box was not bricked, but the internal web server was down.
Rebooting the device would probably be a logical choice at this point, but I was hesitant to try it. After all my tiny, insignificant change somehow put it into a very weird state and who knows what it did to the internal data store. I wasn’t about to risk doing even more damage to it than I already did, so I got Barracuda support team on the phone.
Have you ever dealt with them? They have a great, high quality team that speaks fluent English (which is rare these days) and you can usually get a live person on the phone in under 10 minutes. Unfortunately, the product specialist I talked with wasn’t much help. To diagnose the issue he had to gain access to the device, and since the web based admin panel went to hell I could not give him the permissions to do that. I tried enabling the SSH tunnel from the physical console, but that did not work either but for an entirely different reason.
You see, I never specified that port 22 needed to be open for this box, so our over-eager network admin Andy likely locked it down. So of course I called him next, and he was not very happy to hear from me, seeing how this was his vacation day.
“Dude, I’m on the beach right now. What did you break?”
“SSL VPN box. Barracuda support need to ssh into it to un-fuck it” I replied without missing a beat.
Andy grumbled something about dealing with n00bs while on vacation but eventually agreed to give me the log in information to the firewall so I could open port 22.
“Ok, so the password is my last name, then 123”
“Really?”
“Yeah, all lowercase.”
“I… I have no words for this Andy…”
“No one can spell my last name anyway…”
“Yeah, but your last name is on the website.”
“Oh, yeah… Well, you can change it to something else while you’re in there.”
So of course I did – I changed it to “Fuck You 4ndy” followed by long random string of characters. I assumed he would appreciate that upon coming back after a full day of beach bumming.
Needless to say, once I was armed with Andy’s terribad password, which was about 3 times worse than the kind of shit we yell at our users for, I managed to open the right port and call Barracuda back. Sadly, it turned out to be a massive waste of time. Whatever took down the web server, also seemed to blow away ssh. For an instant I felt bad for even calling Andy on his off day, but then I got over it. I mean, who does he think he is, taking a vacation while I’m in here breaking mission critical systems left and right. Screw him.
Since we ran out of options the Barracuda help desk recommended power-cycling the appliance. The idea was not to do a graceful shutdown but just kill it, and bring it back up without letting it write any permanent changes to disk (unless it already did but we would worry about that later).
So that’s what I did. I mechanically unlocked the face plate on the device, took it off and placed it on the floor. Still chatting with the support guy on the other line, I depressed the power button.
At that very instant something clicked in my brain and with a start, I realized three things:
- The Barracuda SSL VPN box has no face plate
- The box on which I was pressing power had a little plate next to the button that said Dell, and a small sticker labeled “Firewall”
- This entire story was taking place before The Firewall Saga
In case you haven’t read my Firewall Saga, let me explain: at that time, our Checkpoint firewall had a weird glitch that caused it to “forget” the license keys and subsequently close off all network ports, fall off the internet, and bring down the entire network to a halt each time you rebooted it. The only way around it was to log in via the physical console, and manually type in the long license key strings to re-activate it.
While I had the power button depressed, the server kept chugging along just fine. I briefly considered just staying in the server room till 5pm holding that button, but that wasn’t really an option. So I attempted to delicately slide my finger off of it, hoping the machine will forget that I pressed it in the first place. That did not work.
Three point seven seconds later, The Intern appeared in the doorway giving me a questioning look:
“Dude, did you reboot the internet?”
“What? No. Of course not!” I lied discretely pushing the power button again, to bring the firewall back up.
Two seconds later, Jay from accounting materialized behind the Intern and told me the internet went down.
“I know.” I replied “That’s why I’m here. I’m working on it”
In about quarter of a second a third person appeared in the doorway, forming the beginnings of an impromptu conga line. This was one of our supervisors who also noticed the lack of internet – or as he described it “the email erroring out on him”.
“He knows.” said Jay.
“He is working on it. That’s why he is here” added the Intern.
This seemed to placate the supervisory entity. He nodded, and wandered away. As his footsteps faded out in the distance, I heard him repeat “he knows, he’s working on it” to at least three people who were making a bee line for the server room.
Meanwhile I was already dialing Andy, hoping he can walk me through the manual application of the license keys for the Firewall.
So to summarize – I single handedly bricked the SSL VPN box, temporarily took down the firewall and disconnected the entire office from the internet – all before 11am. Amusingly enough, this was not the last crazy thing that happened that day. Not by a long shot.
Why didnt you tell him to split the files into two? :P
Bwhahaha – that brought tears to my eyes. I’ve pushed, and held in, a power button before and even wondered, as I stood there realizing that I had fucked up, whether I could use something to keep the button pushed in – a stick, piece of tape, lean something heavy on it…anything so it wouldn’t actually power down :)
You should have made the intern stand there all day.
@ MrJones2015:
Ah, yes – hindsight is a beautiful thing. That would have probably been the smart thing to do. But I figured that it might be faster for me to quickly change one setting, than to coach the user on how to split the file properly.
I mean, I would have to explain:
– how to create a new folder
– how to check the size of a folder
– how to move files between folders (most users only know “open in Word and Save As”)
– etc..
Also the intranet site was never designed to anticipate multi-part archives so he would have to upload part 1 in the correct place, then upload part 2 under MISC/TBA section and put what it was in the comments. Then I would have to go find it, put the two files in the same folder on the back end and then delete the temp entry for part 2 (or not, depending on what the people who review these files would want). It seemed less hassle just to do a quick change which I have done multiple times in testing without breaking anything. :P
@ Steve:
Yeah, the age old question of “is there something within the reach of my foot I could use to jam this button in the depressed position for the rest of the day”.
@ Rob:
LOLOL!
“Ok, now stand here and hold this button till I come back.”
Many hours later, as I’m eating dinner at home I would realize that the intern is still locked in the server room holding the button in. Then I would shrug and figure he won’t starve to death over night so it’s fine. :P
Rule 1: Never “quickly change a running system”
Rule 2: “Word documents are for yourself, only hand out pdf files”
Rule 3: “Fixing is done BOFH style, always!”
; )
MrJones2015 wrote:
Especially not during business hours when people are using it. :) This lesson was learned the hard way.
This is an amazing story. I am impressed.
This is the power button. I was depressed.
Gui13 wrote:
Just reminds me of the dell computers at university. They have that superbig Dell “button”, that looks like a button, and one assumes its the power switch. It is no button, just a poorly attached crappy plastic part that is so loose one assumes its a broken power switch. But everyones getting fooled, its actually the cheap tiny button next to the the big logo that switches the pc on.
Hehe prequel to The Firewall Saga :) I had the Walking Dead-theme stuck in my head while reading this. Seemed fitting somehow!
@ Steve:
Hahah! That sound like an episode from MacGyver :D
@ MrJones2015:
Dell’s cases have always been quite shitty. Some of their designs are so terribly non-functional you have to wonder if they were designed by blind, arthritic, and developmentally delayed Bonobo Apes. My company still has few of these old desktops where the USB ports are behind a latch that lifts up, and the ports are placed at an angle facing downwards. If the computer is standing on the desk, it is merely inconvenient as you have to lean to see the ports. If it’s on the floor below your desk it is virtually impossible to plug the USB in.
@ Sameer:
Like this?
@ Lucas:
See, McGyver would be able to resolve this with some chewing gum and a paper clip.
Luke Maciak wrote:
Awesome as that is I meant this :)
Sameer wrote:
Does this mean lusers can also bite you and infect you? :o
Also, I’m genuinely coorious (sic! that’s my new spelling and pronunciation of the word!) what went so horribly wrong with the Barracuda.