It’s Friday morning, and the IT cave is misty with the heavy anticipation of coming weekend. No one is working. Some people are reading reddit while sipping their morning coffee, while others are twiddling with their phones. The intern is reading digg because he did not get the official memo that practicing this sort of internet archeology is sort of frowned upon on in this here establishment.
Then there is a sound. It starts faintly, but it quickly rises to audible levels. It’s a unmistakable oscillating, rhythmical thumping hat could be produced only by many pairs of feet walking angrily while wearing high heels. The users are coming!
The intern does a Nightcrawler – he vanishes into thin air in a puff of smoke leaving behind only faint, lingering smell of coffee, sweat and Cheetos. Full time staff reflexively tabs over to Visual Studio window or a terminal so that it looks like we are all busy working on something. The weekend, which was almost here – so close you could almost taste it in the air, suddenly retracts in fear like a startled turtle hiding in it’s shell.
They arrive – angry women in angry shoes with daggers in their eyes.
“WHY WASN’T WEB TIME SHEET UPDATED TO THE LATEST VERSION?” – they demand in unison. The harmonic vibration in their voices make the glass of the old CRT monitor in the corner creek painfully.
Great question. We have wanted to update that server for months now, but it would have to be taken down offline. It is a finicky third party application with a slow binary installer which is usually over 200MB in size and takes about an hour and a half to deploy and configure an upgrade. But of course no one will let us take it down because people need to put their time and expenses in. Most of our other apps are easy – you just take them down at 6pm and spend a few quality “I can’t believe it’s not overtime” hours hanging out and learning Spanish by chatting with the night time cleaning crews. Unfortunately this particular app is used by telecomuters, and people working in the field – and it happens to get most of the traffic after the office hours since that’s when people tend to submit their time and expenses. During the day, the office staff uses the same app to approve these, tabulate them and run reports. The traffic bottoms out around 3am on Saturday but most of us block out any and all work related memories as soon as we leave the office on Friday evening.
As a result, the app was left lagging behind. It is obscure enough for the management not to care about it being out of date, but important enough for users to whine, cry, protest and beg for extensions every time we try to schedule some downtime. But suddenly, someone out there realized that the current version of the app is 8.24 and we are still running version 8.13.
Guess what about this scenario really bothers our guests with daggers in their eyes, and sharp spiky shoes? Yep, it’s the 13. I mean, why would anyone sane even put such an unlucky number on their software. No wonder the application is slow! It’s like breaking a ladder, stepping over a mirror, and crossing the road while wearing a black cat as a hat all rolled up into one.
Management agrees. Due to the version number being extremely unlucky we are to upgrade the server ASAP.
The angry shoe brigade departs, and I set to work. I open remote desktop session to the musty old server, queue up download of the monstrous update file and decide to do a database backup. I have done this upgrade dozens of times, and I have never actually needed these backups. But I always do them just in case. So I pull up SSMSE 2005 and run a backup on the timesheet database. I foolishly opt to save it in the default directory which happens to be on the tiny system partition that is almost full. To make sure I do not run out of space halfway through the upgrade I run file cleanup wizard and have the system delete all temp files and compress the shit out of everything. Then I go do other stuff.
In the meantime the intern returns from the netherworld covered in glowstone dust and Zombie Pigmen entrails. As a punishment for his cowardice I send him on a quest to obtain coffee and bagels. I make sure he knows not to return to the office without an adequate quantity of veggie cream cheese and at least one poppy seed bagel. Because if I’m going to be dealing with this damn upgrade today, I deserve the trace quantities of opium carried by poppy seeds to dull the pain of Windows 2003 maintenance.
The weekend pokes it’s warm and hazy head from the dark corner of the office and surveys the situation for a while. Soon enough we all feel it’s pleasant tendrils tickling the pleasure centers of our minds with promises of places other than this office.
Once everything is downloaded and backed up I scan through the upgrade checklist graciously provided by the vendor. It is written in happy market-speak but I quickly decipher it to carry a rather chilling message. It goes a little bit like this:
“Good news everyone. We have created seven new dependencies. Make sure you install all this shit on your server before you attempt an upgrade. Also we fucked with the database schema and the installer was written by the CEO’s nephew so the upgrade will probably royally cock up everything. Here is our tech support number because you will definitely need it.”
I install all the prerequisites, and then spend 15 minutes hovering my mouse over the installer icon hesitating to take the plunge. Finally, I decide to call the support drones and make sure if with all these changes they recommended to go straight from 8.13 to 8.24 or should we rather do it incrementally.
About an hour later I finally get a live person on the phone. I explain the situation and get a reassuring answer – the installer will definitely work with my version and there should be no problems. Furthermore they no longer offer the installers for previous versions so I could not do it incrementally even if I wanted to. Splendid!
The weekend pouts, rolls up it’s tendrils and hides under a table in a huff. The intern senses danger and instinctively moves to the farthest cubicle. In the meantime I get off the phone and run the installer. A minute later I see this:
“Unhandled Exception: Object pointer was not set to an instance. Press OK to terminate installation.”
Did I mention this is a commercial application we actually paid for?
I make another toll free long distance call to India. About an hour and a half, and three support technicians and two remote sessions later they finally figure out what was causing the exception.
Apparently the newest installer looks for an IIS website named “Web TimeSheet”. If it is not present, it keels over and dies with a cryptic message. Kudos for amazing design. Previous version did not really care about this and happily used the “Default Website” IIS preset so I did not even realize this could have been a problem.
But at least we are back on track. The tech support drone makes sure everything else is in order, runs the installer for me and for a few minutes we sit there and quietly watch the progress bar moving at the astonishing speed of a pixel per minute. Soon her realizes that the process will take quite a bit of time, so I let him go assuming that everything will be ok from this point.
Yeah, I know – I’m being silly and reckless. Having optimistic thoughts while performing a major upgrade is like having sex while being a minor character in a slasher horror movie. It just can’t well.
The installation takes about an hour, and there are no further errors. The weekend is still rolled up under the table and refuses to come out. Few of my buddies are trying to coax it out by talking about their respective weekend plans, but it’s not working.
I take a deep breath and try to log into the newly upgraded service. Integrated windows login box comes up asking me for my domain credentials – something that has never happened before.
I get my Indian friend on the phone, and have him do yet another remote session. The pokes around the system and concludes this is caused by a known flaw in the installer. Every once in a blue moon it decides to default to integrated windows authentication instead of the internal database. They have never really bothered to track the bug down because they are trying to phase out self-hosted instances, and push their new and shiny “cloud hosted” service instead. So the installer they push out to users like us is basically an afterthought at this point.
Of course the first thing he asks me when he determines they shitty software just royally fucked us in the ass without lube is “This is a test server, right?”
I happen to have him on speaker so everyone in the room bursts out laughing. Even the intern pokes his head out from his hideout cubicle and giggles. We all shoot him a warning glance, because we are fairly sure he has no clue what we are laughing about.
I carefully explain that our company does not seem to believe in things like test servers for apps that cost money per seat and require windows licenses, SQL server licenses and all kinds of other expensive crap. Management is much more comfortable with a “Developmensrtruction” environment for such endeavors. We only get test servers for open source or in house projects, and only if we can build them from scrap and refuse parts, of if we can convince the highly overpaid outside consultants to sell the boss people on the idea. Also I mention that someone is probably going to be burgeoned with a shoe if the server is not up by the end of the day. And if that person is me, then I will make it my mission for the weekend to fly out to India and pay him a friendly visit.
The poor guy proceeds to soil his pants in fear and comes up with “easy” solution: We just uninstall everything, roll back the database to a the previous state using a backup. I knew it was a good idea to do a backup.
He pulls up SSMSE and tries to restore the database from my backup but it bugs out. He closes the error message before I can even read it, and tries again. And again. And again… I eventually stop him, and read the message. It says the backup file is compressed, and it should not be. The tech drone on the other side of the phone line makes whimpering sounds which I imagine roughly translate to “this is not in my manual”. Fortunately I know what happened. Remember that cleanup I did at the beginning? Apparently my backup file got compressed too. No worries. I just find it, pull up the File Properties dialog, un-check the compression box and hit appy. It’s a big file (close to 2GB) so it starts churning away. I let him know it will probably be a while and we should wait till it’s done. Then I excuse myself and quickly run to the bathroom.
When I return I see him messing around with the database restore feature again. It craps out every single time, this time saying that the backup file is in use and cannot be opened. Of course it is in use – it is being uncompressed. I explain it to him again, and ask him to wait a bit.
He lasts five minutes and then he is at it again. This time however he stops after five or six attempts and studies the error message for several minutes. Then he concludes:
“Ok, I think this is not moving. I’m just going to end this task so we can run the restore.”
I point out that killing the process while it is in the middle of decompressing the file is probably not a good idea. I’m not sure how it will affect the file, but chances are it will end up corrupted or in a weird unusable state.
He says: “Oh, ok.”
Then I see him open task manager. My hand is unfortunately not on the mouse. I have one hand on my coffee mug, and the other one under my chin, watching this clowns antics. I did not expect that I would have to wrestle the control away from him. I see the cursor angling towards the “End Process” button in slow motion and I make a desperate lounge for the mouse while doing my best Shia Labeef impersonation:
“No, no, no, no, no, no, NOOOO!”
I’m to late. He clicks the button, then closes task manager before I manage to grab the mouse. Then he goes:
“Woah… What happened?”
The process he killed happened to be explorer.exe so naturally his innocent action took down the task manager, the start menu and etc. Since he minimized pretty much everything the screen is blank save for some desktop icons.
“You just killed eplorer” I explain “and you probably corrupted our fucking backup.”
He goes: “I’m… sorry…”
We test the file, and sure enough SSMSE won’t touch it and it is no longer marked as compressed. He asks me if there are other backups. This time no one laughs.
Yes, we have nightly backups to tape. But the latest backup backup from 10pm yesterday, but if we use that we lose all the data that was entered late night and this morning. And that lands us directly in the middle of shit creek without a paddle.
After some cursing (on our end), apologizing (on his end), he decides that perhaps we don’t even need to roll the database back. So he proceeds with Plan B.
He runs the installer again, clicks on “Advanced Options” then “Misc Settings”, “Other Options”, “Administrative Settings”, “Beware of the Leopard” and “DO NOT TOUCH THESE SETTINGS UNDER ANY CIRCUMSTANCES” and un-checks a box labeled “Use Integrated Windows Authentication”. This time I make him stay on the phone, watch the progress bar and sweat.
About an hour and a half later the installation finishes and everything works fine. Hey, it only took the entire Friday to do this.
Moral of the story:
- Outsourced tech support sucks.
- Test servers save lives
- Never take your hand off the mouse when a clown is behind the wheel via remote deskop
- Using proprietary web apps that require Windows, .NET, IIS, SQL Server and third party plugins is just begging for pain
I think I now have an ulcer, and three nervous ticks. I really should just stick to writing code. It is much less stressful and much more rewarding.
Reminds me of some of my afternoons. The number 13 stuff drives me batty though. Wikipedia says there isn’t a known reason for the association of 13 and bad luck that’s considered likely, but the reason I’d heard was that in the middle ages it was considered good luck. Come the Reformation and the reformers thought that was too superstitious and so made it bad luck.
Writing code doesn’t help. You end up not having control of the production environment (Outsourced) – and the production upgrade will be done by a new person each time, and each of them insist that making guesses is better than actually reading the deployment documents’ steps.*
Trust me on that. This means bye-bye Saturday. The good point is that it is usually on the “Thank [deity] this is on overtime — how did the clock advance 14 hours on this max 30 min update + quick testing” part.
* Any resemblance to living persons or actual occurrences are completely coincidental.
You’re a POET!
Let me guess, some HR drones made the request to have it upgraded NOW? Yeah, been there done that. Seems like HR won’t let you touch their systems for the longest time and then things get way too out of date so it’s a mess to upgrade when they finally do want it upgraded. Of course when they do want it upgraded, they want it done immediately. I also think all commercial software for HR is written by a room full of masturbating monkeys, but that’s another story.
We had some system for our HR department at my old job that kept track of employee benefits and other such nonsense. It was such a convoluted clusterfuck that I refused to touch it and had an outside consultant come in to do any work on it. Even he had trouble with it, and he was supposed to be an expert in it. He spent half of his on site sessions on the phone with the company that made the software.
“NO NO YOU NO KILL PROCESS! VERY BAD WILL HAPPEN! YOU UNDERSTAND!??”
The killing explorer part reminds me of a funky situation I had at work. My team member has “8 years of dev experience” which he-she loves to gush. We were testing this particular app which hooks on the explorer.exe process. So in order to test a new build of the app, after installation, one need not reboot Windows but just kill and restart the explorer process. My team member happens to have some problems with this instruction so I sat with him-her during the installation process. When the part about killing explorer came about, he-she said “Ok, now we’ll kill explorer dot E X E…”, and proceeded to click the red “X” on the My Computer explorer window.
8 years of experience indeed…
Tell stories about the intern! I whimpered whenever you mentioned him as I’m going to be an intern in 4 months.
@ Adrian:
Opera 9.80?
Oh my gosh!
MrJones201 wrote:
My ‘About Opera’ says 11.50.
And look who’s talking: FF 1.5.0.12. :D
@ astine:
I always thought the 13 was tied to some numerology mumbo-jumbo. Honestly, I doubt that there is any justification 90% of these silly superstitions. Like black cats or broken mirrors for example.
@ Tormod Haugen:
Fortunately I usually get to deploy and tweak the stuff I have written so I usually don’t run into this particular issue. But yeah, if I had to rely on someone else to do deployment that would be a nightmare.
Actually, scratch that – this does happen every time we need to patch something that was already deployed on remote client machines. Pretty much the standard procedure these days is to get every single of the 50+ remote users on the phone, remote into their machine and perform the patch yourself. Also we are supposed to schedule these calls during their “down time” so that we don’t hurt productivity with all these stupid updates and interruptions. :P
Victoria wrote:
And I didn’t even know it! :D
@ Rob:
That’s more or less exactly what happened. :)
This reminds me of my Firewall upgrade story, which I never posted about. I think it’s because I was so burned out by it that I could not force myself to write about it. But enough time have passed for me to be able to laugh at it. So I might write that up and share the horror. :)
@ Mart:
Oh man, in my experience all these people who brag about having X years of experience in whatever are the worst. That’s like red – warning, you are most likely dealing with a complete idiot.
@ Adrian:
The intern is sort of on loan from the filing & coffee making department. We call him the CFO (stands for “Chief Filing Officer”). Also he implemented look-ahead pre-fetching optimization of the filing cabinets. Or at least that what we say when someone finds misfiled folders – he is just optimizing. :)
@ MrJones201 &
@ Adrian:
Whoops. Looks like the browser detection plugin is being kinda broken. I did not write it, and I try to update these things regularly but perhaps this one is somehow out of sync.
Whahaha. Excellent.
Dude you should move to South Africa if you want to see real ineptitude.
Why would I use windows Version 7 when they have version 95? :P
Great post :) I can’t hear the clicking heels…I get heads popping around the cabinet.