firewall-saga – Terminally Incoherent http://www.terminally-incoherent.com/blog I will not fix your computer. Wed, 05 Jan 2022 03:54:09 +0000 en-US hourly 1 https://wordpress.org/?v=4.7.26 The Firewall Saga: Part 7 http://www.terminally-incoherent.com/blog/2011/09/19/the-firewall-saga-part-7/ http://www.terminally-incoherent.com/blog/2011/09/19/the-firewall-saga-part-7/#comments Mon, 19 Sep 2011 14:28:47 +0000 http://www.terminally-incoherent.com/blog/?p=10036 Continue reading ]]> It has been almost a week since Steve’s visit. My daily interactions with Verizon have settled into a very predictable pattern. Every afternoon I get an “unexpected” visit from an on-site tech. I explain the problem is not local, and they leave. I call Verizon and complain. I bitch, moan, threaten to change service and etc. I talk to a floor manager at the call center who does not have any power or resources to help me. All he has is an online chat tool and ticketing system for escalating things to tier 2 – the exact tools the phone drones use. His boss can’t help me because he is in no way, shape or form affiliated with Verizon, but rather manages the call center company. After a while they placate me with solemn promises of swift resolution, reimbursement and etc. The following morning I get a call, notifying me that my issue was market as resolved. I call again, bitch, yell and complain some more. The issue gets escalated. Tier 2 picks it up, and bumps it back down with a note to dispatch a technician to check all the connections, and jiggle all the wires. And the cycle repeats.

Every time I call, I give them the same speech. We had this issue once before. Somehow you have resolved it. All you need to do is to look about a year back in your case history and figure out what was done back then. Unfortunately, I get the impression that the people I’m dealing with either do not have access too, or do not keep case notes that would reach that far back. I’m documenting all the dates, and failed attempts to rectify the issue, because I fully intend to ask them to refund us all this lost time.

In the meantime a brand new version of certain crappy, proprietary web app comes out, and the angry shoes brigade gets annoyed. I can’t upgrade that server, because if you recall from previous installments of this series, it is located in another data center, and the VPN tunnel is broken.

I figure that the Verizon issue does not look like it is going to get resolved anytime soon. The lack of connectivity with the data center is becoming a nuisance, so I decide to call up Barry from the network team. I figure that I will get him, and Charlie from the data center and we’ll just keep rebooting the damn machines and tweaking Firewall rules until they sync up and establish a viable connection.

I couldn’t have picked better timing to call about this. It turns out that Barry and Toby are going to be in our data center the next day installing and configuring some new hardware. Toby is apparently in charge of hauling all the equipment onto the site, while Barry will be bringing his networking skills. Even better, Barry has to drive past my office on his way to the data center so he agrees to just stop by in the morning. This way we will have one of these guys on each end of this conundrum, and we won’t have to rely on people like Agent Beef to be our hands, eyes and ears in the rack-space. Our plan is to get things done in the early morning, before the managers and directors start slowly trickling in after 9am.

Next morning I arrive at the office extra early. When I leave the house it is still dark outside. When I arrive at the office, my car is the third vehicle on the completely empty lot. As I grab my laptop bag from the trunk, the day star crests over the horizon and vomits painful bright orange light onto the deserted sea of concrete. People talk about dawn as if it was something beautiful and romantic – but every single time I see one, it is like getting stabbed in the face with a condensed beam of fatigue.

The malnourished, dirty hobo birds that stupidly picked the parking lot as their feeding ground are woken by suns slow upwards creep and decide it is time to scream their fucking beaks off like it’s some big event. I flip them off, and curmudgeonly drag my sleep deprived carcass into the building.

I bump into one of my early rising coworkers – he is on the super early shift, to field those 6am phone calls from our workaholic clients whose morning routine includes 3 important business calls, shower and coffee. Some people jog in the morning – these folks make work related calls for sport I guess. Never understood this attitude, but then again who am I to judge.

My coworker inquires about my unusually and uncharacteristically early arrival. I attempt to tell him that I have an appointment with Barry to fix some outstanding issues with our network but what comes out of my mouth is:

“Hhhnngrrr brrrry ntfffff!”

Somehow I manage to convince my stiff and unresponsive body to make a zombie shuffle up to the coffee machine. I suck on it for about 20 minutes, and then collapse at my desk. I try calling Barry, but he is incommunicado.

So I wait… And wait… And wait some more.

9am rolls around and Barry is still MIA. Finally he calls me around 9:30 to let me know he is about 15 minutes from the office. He arrives at 10:45. Barry lives out of sync with the normal time-space continuum and his personal field of influence time works differently. Of course Toby is not at the data center yet, so all we can do is to run some local checks, and make sure the firewall rules are correct as we wait. I use this time to fill Barry in on my dealings with Verizon. He is amused, appalled but not very surprised. He offers to do a tag-team call with me so we can take turns yelling at Verizon. I doubt that we will accomplish anything new, but I am willing to try anything at this point. Hell, I don’t even want to strangle him for making me wake up so early and then failing to show up till almost 11am. That would require too much energy, and in my sleep deprived state I am all about energy conservation.

Eventually Toby gets to the data center around noon and we get to do some troubleshooting. It appears that firewalls on both ends see each other, but for some reason can’t establish a tunnel. Unfortunately Toby’s end uses a very dumb dedicated network appliance which is not giving us any good diagnostic data or meaningful error messages. After few reboots of the appliance Barry gets an idea.

“Toby, what is the time on that appliance?”

Toby scrambles to find the information in the web interface. You can hear him click on a dozen of tabs and/or links before he finds a status page. Finally he goes:

“It’s is showing time as 12:25pm EST”

I watch Barry run the date command on the firewall’s console. It spits out 12:13pm EST. Way off! He quickly resets the time on our end, tries to re-establish connection and we see the VPN tunnel snap to life. Apparently the authentication algorithms were thrown off by the time discrepancy on the two systems. When Toby and Barry set up this replacement firewall in Part 2 they probably did not bother syncing it with an NTP server. Most likely Toby just glanced at the wall clock when setting up the date – one of those cheap, unreliable battery powered things that tend to drift a lot. That was the reason why we got cut off from the data center.

Now if we could only fix the non-routable IP issue that quickly. Since Barry is already logged into the firewall he decides to poke around a bit. We more or less exhausted all the possibilities last time, but he figures we can perhaps take screenshots, and logs and use them to support our claims. We get Toby to plug his laptop into an external line (not the VPN one) and send packets to the non-routing IP, while we watch the activity on the screen. Something weird happens – we see text scrolling down the screen. Packets are coming in.

I jump of my stool and scramble to boot up that laptop we set up for Steve. It is running IIS, and a simple test webpage and the firewall is set to route all the inbound traffic to it’s internal IP. When it’s up, we ask Toby to try hitting that IP with his web browser.

There is a short pause and he goes:

“Oh shit! I see an animal!”

I whip out my phone, and sure enough – there is my test page:

The title attribute for this page was Mushroom, Mushroom.

“Barry… What the fuck just happened?”

Barry is just as stumped as me. The only thing we changed on the firewall today was the time. It is impossible that a 10 minute system clock drift could possibly have any effect of routability of one of our 5 IP addresses. Nothing we did today could have possibly resolved or issue. And yet, our insurmountable problem, somehow fixed itself, literally overnight (when I checked it last night it was still broken). How did this happen? I have no clue. Barry had a hypothesis or two:

  • It is possible (but unlikely) that my complaints somehow got forwarded to the right department
  • Perhaps some network engineer noticed this issue during regular maintenance and fixed it
  • It is also possible that Verizon routers have some self healing protocols that cause them refresh their routing tables every once in a while

I feel relieved, but also a bit cheated. I sort of wanted Verizon to acknowledge this problem and resolve it. If it happens again (and it may) we will be back to square one, dispatching useless technicians to fix a routing issue. On the other hand, the thought of no longer having to deal with Verizon made me extremely happy. I was just sick and tired of the entire ordeal – especially the brain dead Verizon tech support drones and on-site technicians.

You would think that this is the end of the story, but it is not. There is still one event of note that I still haven’t mentioned. But to get to it, we have to advance the clock by about a month or two. I am finally free of grief and residual pain from this ordeal, and I’m turning it into a long running multi-part series on Terminally Incoherent. Around the time the part of the story where I introduce Steve hits the web, I suddenly get a text message from on old friend:

Do you want me to file an internal escalation for you?

Note that this is completely out of the blue, and out of context for me. By now I’m done with this whole issue. It is ancient history that made for a funny series of articles. So my response is along the lines of “Huh? An escalation for what? Why?”. Then he explains:

For your nonroutable ip from your firewall saga

This, ladies and gentlemen is the exact moment when I punched myself in the face. Somehow I managed to completely forget that I know someone on the inside. I have insider contacts within the bowels of Verizon. And apparently while I was sitting here contemplating firebombing their headquarters, this person could have filed an internal ticket for me. A ticket that that could have potentially helped to fast track this entire ordeal.

Thus concludes The Firewall Saga. I have some more stuff like this in the pipeline – though probably not as long. Now that the series is over, I went back and added a little navigation table at the end of each post. This way, if you decide to share this story with a friend, they can just click through to the end and read the entire thing without too much hunting around.

The Firewall Saga
<< Prev Next >>
]]>
http://www.terminally-incoherent.com/blog/2011/09/19/the-firewall-saga-part-7/feed/ 6
The Firewall Saga: Part 6 http://www.terminally-incoherent.com/blog/2011/09/14/firewall-saga-part-6/ http://www.terminally-incoherent.com/blog/2011/09/14/firewall-saga-part-6/#comments Wed, 14 Sep 2011 14:18:44 +0000 http://www.terminally-incoherent.com/blog/?p=10023 Continue reading ]]> On the last episode of The Firewall Saga we met Steve – a peculiar Verizon technician who turned out not to be the “Network Specialist” we were promised. I managed to co-opt him into a crazy plan of getting the Verizon tier 2 techs to escalate my issue. Last time I saw him, he was making tomato stains on the wall of my server room. So I relocated him to the lunch area.

Now, Steve is starting to get on my nerves. He has been sitting in the lunch room for about 40 minutes, devouring pretty much everything. I think someone offered him a drink, and he eagerly cleaned out four cans of soda from the communal fridge, devoured two snack sized bags of potato chips and dipped into the basket of assorted sweets we had on the table. His ravenous appetite reminds me of Shaggy from Scoobie-Doo, that is if Shaggie was a graying gentleman in a trucker hat and with handlebar mustache.

Around the two hour mark, he finally gets connected to a live person. He hurries to the server room, logs into his laptop and begins troubleshooting. I help him plug it into the network, and give him a piece of paper with the IP address and the default gateway he needs to use. Then I grab my cell and dial Barry in case we need to do something on the firewall side. Steve is busy mumbling into his phone, and typing up a storm on his machine. Suddenly he turns to me and goes:

“Hey pal, I think this cable you gave me is not live. I can’t get out on the internet.”

I know for sure that the cable is live, because I used it to test the dummy laptop this morning. I ask him if he used the IP address I gave him. His eyes glaze over, and his expression betrays that he has not the faintest clue what I’m talking about. I attempt to explain, but I realize that everything that comes out of my mouth sounds like some abstract moon language to this guy. So I give up, and decide to let him use the dummy machine we configured. I connect it, boot it up and let him at it.

This is Steve’s browsing session in an itemized list form:

  1. He clicks on Internet Explorer
  2. He smiles as the MSN page comes up
  3. He uses the mouse to click on the search box even though it already has focus
  4. He types in ‘google.com’ into the search box
  5. He uses mouse to click on the Search button
  6. He clicks on the first link
  7. He clicks inside of the Google search box even though it already has focus
  8. He goes “Ok, so what’s that address you want me to go to?” into his phone
  9. He types in 197.168.1.1
  10. He uses his mouse to click on the “Search” button
  11. He scrolls around the results page by using the scroll bar, completely ignoring the scroll wheel on the mouse
  12. He goes: “Ummmmm… I am not seeing that…”

I look at The Intern, and he looks at me. We are both in a state of shock, and at a complete loss of words. We just stand there for a solid minute unable to say anything. Finally he breaks the silence and whispers to me:

“Dude, what the fuck is he doing?”

The worst part is that I know exactly what he is doing. He is trying to log into the Actiontech router, that we are not using. But he is failing at it so hard, that he ought to receive some sort of award for it. I suddenly realize that Steve knows less about computers than most of my users. On its own, this would be quite an accomplishment. But the fact he is actually working as a Verizon on-site support technician makes his technological illiteracy quite ironic.

Eventually Steve’s counterpart on the other end of the line manages to explain to him how to use an address box. They contemplate the inability to bring up the Actiontech login page for a bit, and conclude it is time to power-cycle the router. Steve gets up, looks at the server rack, looks at the walls, scans the entire room and becomes confused.

“Buddy, where do you keep the Actiontech router?” he asks.

I explain we are not using it, but Steve refuses to accept that as an answer. He can’t comprehend how we could possibly connect to the internet without the router. He decides that it has to be somewhere and starts snooping around the server room. He looks behind the server rack, the tries to open the other rack next to it, all the while trying to explain to us how the device would look like. I look at The Intern and go:

“I think I have made a huge mistake…”

Steve’s exploratory search for truth brings him to the shelves where we store spare parts, cables, assorted cable control devices. It just so happens that that’s where we left the poor Actiontech router. It’s been sitting on that shelf for years now, gathering dust and acting as a paperweight. Steve spots it, yells “Aha!” and slides it from underneath all the crap that was on top, and triumphantly waves it at me. Bereft of their support, some boxes and trays that were above the forgotten router topple down, spilling wire clips, Velcro fasteners and papers all across the floor. The Intern dives in to rescue falling equipment while I just stand there staring in astonishment.

Steve makes a happy dance, as if he solved the issue. Here is the root of your problem gentlemen. Your router was not connected, and I, the great Sherlock Holmes, found it on this dusty shelf.

Steve triumphantly brings the router to where we set up the laptop, plugs it into the electric socket, then unplugs his Ethernet cable from the main switch, and connects it to the router. He types something in and goes:

“There we go!”

Apparently he finally got to the Actiontech login page. Unfortunately his happiness is short lived. I watch his smile turn into a frown as he is obviously unable to get internet connection. Him and his buddy on the phone go into this intense troubleshooting session of a router that is not connected to anything other than the laptop. After about 5 minutes of this, I interrupt them and go:

“Steve, that router is not connected to anything!”

Steve looks at me befuddled, wiggles the Ethernet cable between his laptop and the appliance: “Sure it is!”

When I try to explain, Steve just asks me to “Sit tight” and assures me that they “will get to the bottom of this”. I am not so sure of that. I think we have reached the rock bottom when Steve found the router. Now he is diligently digging himself into a hole that is getting deeper and deeper every minute.

After some more troubleshooting, Steve hangs up the phone, gets up, waves me over and goes: “Good news buddy. We figured out what was wrong with your connection.”

“Oh, really?”

Apparently Steve completely forgot that he was supposed to be a pawn in my clever ploy to get to tier 2 support to fuck off, and get network engineer on the case. Apparently he managed to resolve the issue all on his own. What a hero!

“Yep. That router…” he points at the still-disconnected device “…there is something wrong with it. The good news is, that I have a spare router in the truck. So I’m gonna go, have a smoke, grab something to eat and bring it up here.”

He gives me a friendly pat on the shoulder.

“We’ll get you all patched up, and back online in no time.”

At that point, I politely thank Steve for his help, ask him to gather up his stuff and not come back. There is just no point in continuing this charade past this point. The Intern seems to be having a blast watching this unfold, but I’m just annoyed. And it’s not really Steve’s fault. I’m sure he would do fine as a residential support tech. His cluelessness wouldn’t really hold him back that much if all he had to do was to power-cycle and/or replace appliances and crimp wires. This issue is just way above his head. In fact, it seems to be way above the head most of the on-site techs that Verizon likes to send out to their customers. The whole ordeal is just a monumental waste of my time.

Next morning I arrive at my desk, only to field an early morning call from an old friend:

“Hi, this is Bob from Verizon and I’m just doing a follow up courtesy call about your recent issue. I see here that yesterday we have sent an on-site technician to your location. He has marked the issue as resolved. I wanted to make sure that everything is working correctly and see if there is anything else we can do for you.”

Well, Bob… Since you have asked, let me tell you a story about a guy named Steve.

Next time on The Firewall Saga: the long awaited resolution.

The Firewall Saga
<< Prev Next >>
]]>
http://www.terminally-incoherent.com/blog/2011/09/14/firewall-saga-part-6/feed/ 14
The Firewall Saga: Part 5 http://www.terminally-incoherent.com/blog/2011/09/12/firewall-saga-part-5/ http://www.terminally-incoherent.com/blog/2011/09/12/firewall-saga-part-5/#comments Mon, 12 Sep 2011 14:32:19 +0000 http://www.terminally-incoherent.com/blog/?p=10008 Continue reading ]]> Welcome to the penultimate yet another installment of the Firewall Saga (it was supposed to be penultimate but it did not work out that way). If you haven’t been following it, please try to catch up. It will make more sense that way.

When we left off last time, a firewall replacement somehow left me with a non routable IP address – a problem that, beyond any shade of doubt was my ISP’s fault. I have called Verizon, only to realize their outsourced tech support call center was entirely incapable of dealing with problems of this complexity. I needed to talk to a network engineer to resolve a router configuration issue but they misunderstood and sent me a repair monkey to jiggle the cables and power-cycle the local appliances. I called them back, and ranted for about 20 minutes, heavily over-using words such as incompetence, outrage, lack of professionalism, dropping the ball, disrespecting the customer, etc… It made me feel a bit better, and they did promise to definitely escalate my issue to second tier.

I walk into work, the following morning, wake up The Intern who has dozed off at his workstation, acquire coffee and start sifting through all the spam in my inbox. Few years ago, my inbox was pristine clean – mostly untouched by the filth of spam messages. My co-workers used to marvel at this phenomenon, and inquired how do I manage to save off the avalanches of crap that flooded their email daily. Unfortunately I do not have a secret technique. I’m simply careful not to give out my email on the internets, and vigilant about deleting and flagging anything that seemed suspicious. Then I went on vacation, and my boss told me to put up an auto-reply “out of office” message. Nowadays it seems like my email is on the list of every single disposed Nigerian prince, penis enlargement specialist and Viagra salesman. Also, apparently I have money in 50+ different banks, who constantly threaten to close my account if I don’t give them my PIN and passwords. It has gotten so bad, that it is actually easier to white-list internal company correspondence and emails from known clients and partners. I currently have close to a 100 filtering rules that help me to fight with the sea of unwanted spam, and make the important and urgent emails instantly visible by application of labels and priority folders.

Email filters – a forgotten arcane art, mastered only by the chosen few. I know for sure that the only client-side filters used in my company have been set up by me. No one else even knows such things exist.

I’m in the middle of fiddling with my tangled web of email filtering rules when I hear my phone ringing. I’m expecting to hear yet another complaint about the time sheet app. If you recall, the whole firewall bonanza somehow broke the VPN tunnel to that remotely hosted server. So for the time being, I am unable to reboot it or tinker with it. But since the non-routable IP issue is more pressing I have pushed the VPN problems aside. Especially after what happened the last time I attempted to fix it. Surprisingly, it was not an internal call. It was a Verizon representative doing a courtesy call. It went a little bit like this:

“Hi, this is Bob from Verizon and I’m just doing a follow up courtesy call about your recent issue. I see here that yesterday we have sent an on-site technician to your location. He has marked the issue as resolved. I wanted to make sure that everything is working correctly and see if there is anything else we can do for you.”

Here are some of the emotions I’m feeling at that exact moment: anger, annoyance, disbelief, rage, befuddlement and hunger… The last one, because I didn’t have a chance to get anything to eat that morning. To help you visualize my reaction, here is a two panel re-enactment of that event. Just imagine that the iPhone is a big clunky office phone, and the coffee mug is a paper cup, and that my shirt has a collar:

He marked it as what?

So it turns out that the field technician that visited us the other day decided to say he fixed the issue. In retrospect, I guess I can understand how it happened. It is very likely that this guy does not work directly for Verizon. He probably works for some local company that Verizon uses to outsource all the cable wiggling, wire snipping and power cycling it needs to do at customer locations. They are likely set up to receive work orders from up high. When they fail to resolve a customer facing issue (for whatever reason) it probably counts against them. So this guys manager probably just said “fuck it, since it was not a local problem we will just put it in the system as resolved”.

But that only occurred to me much, much later. As I’m sitting there on the phone my driving, logic clouding emotion is anger. The upside is that “Bob from Verizon” seems to be speaking perfect English. This is something new. All the support drones I dealt with up until now had very heavy accents. So chances are I’m actually talking to someone physically located in the states. Probably still not an employee of Verizon, but perhaps his call center/department can get me what I need.

So I recount my long and sad story, spearing him no gruesome details. When I’m done, he apologizes profusely then promises to get my issue resolved. He gets a support drone on the phone and together we rely the issue, and it’s importance to him. The thick-accented drone gets in touch with tier 2 support. Tier 2 support insists on sending a network specialist to our location. I try to protest, and try to make a compelling case against it but it seems like there is no use. Apparently they have to make absolutely sure the issue is not local, before they escalate it to the network people. So we make an appointment. I call Barry and let him know we have this guy coming. Together we set up a spare laptop, plug it into our network, assign it a static IP and set firewall to pretend it is our server. The guy will be able to jump onto it and verify that no packets are coming in. I also print out a network diagram, and Barry sends me a document that contains all the relevant firewall rules. When our Network Specialist comes with a visit, we ought to have enough evidence to show him the problem is definitely not on our end.

The next day, our “Network Specialist” Steve arrives at the office. Only, he doesn’t look like a specialist. Of course, looks can be deceiving – and geeky guys can sometimes look peculiar. But this guy just does not look like a networking dude. He a middle aged man, wearing a trucker hat, shorts and a crumpled up t-shirt. The large coffee stain on the front, seems to be locked in territorial combat with his the armpit sweat stains. His gray handlebar mustache gives me an impression that he would be much more comfortable rebuilding motorcycles than troubleshooting network issues. But I decide to give him a benefit of the doubt.

I take him to the server room, where we set up our perfect trap. Next to the rack, there is a little stool, and on it there is the orgy of evidence. The network diagrams, the firewall rules, the trace route logs and the little test laptop ready to be fired up and tested. His eyes glaze over a bit as I talk so I ask him what tests he needs to run, and explain how we rigged the test laptop. He goes:

“Son, no offense but I have no clue what any of what you just said means. I was under impression yous guys had no internet connection…”

I have a sinking feeling in the pit of my stomach.

“You are not a ‘Network Specialist’, are you?”

“What? Hell no! Kid, I was retired up until last week. This is my first day on the job. I sure ain’t no specialist!”

Well, fuck.

I explain my predicament to him. I was promised a specialist, but I got him. Tier 2 refuses to move forward until they have someone on-site run the checks they require. So I hatch a crazy plan. I have now a physical Verizon representative on the premises. Well, more like a trained monkey really – I don’t think he knows anything about anything, but he should be able to follow simple instructions. If we can get the tier 2 assholes on the phone, they can walk him through the required tests. Then we can move on.

Steve agrees to this crazy plan, but says he will probably need his company laptop which he left in the truck. Fair enough. I escort him out of the office and let the front desk know he will be coming right back and to send him right to me.

Steve is gone for about an hour an a half. When he finally shows up, I notice his shirt has acquired ketchup and mustard stains and is beginning to look like a genuine abstract painting. I ask him what happened and he launches into a long winded explanation how he first decided to have a smoke, then he realized he was hungry, and how he has low blood sugar and etc.. I let it go. The sooner we can do these tests, the faster I can get him out of my hair. I show him where to set up, and watch him pull out his ancient flip phone and dial a number.

Then he gets to a voice menu. Then they put him on hold. I shake my head in disbelief:

“Wow, they put you guys on hold too?”

“Of course.” he gives me a wide, gap-toothed smile “People think we have some special, internal number but we don’t. We call the same tech-support number as you do, when you have a problem”

That sinking feeling I mentioned before – it’s back with a vengeance. Steve patiently waits for “the next available representative” while I contemplate suicide for the twentieth time this week. Eventually I get bored watching Steve, and I excuse myself figuring I might as well get some work done. I interrupt The Intern’s intense game of tower defense and tell him to go keep Steve company, and make sure he does not try to mess with the equipment, or walk out with any of our servers. Oh, and to call me when Steve finally gets a live person on the phone.

After about 20 minutes I get a phone call on my desk. It’s not The Intern, but one of my other coworkers.

“Luke, I think we have a problem…”

Oh, God… As if I didn’t have enough problems.

“Damn it… What did you break this time…”

“No, this is more of a Human Resources problem.

Oh, sweet relief! At least I won’t have to deal with this.

“And you are calling me about it because…”

“Well, it involves your server room. I think we have a hobo infestation.”

I chuckle, and explain that he was actually sent by Verizon.

“Ah, that’s what they all say. Next thing you know they start breeding and you get like a dozen homeless people living in your server room. Mark my words man.”

“What do you suggest, oh wise one?”

“Nuke it from the orbit. That’s the only way to make sure.”

“Well, I have The Intern babysitting him…”

“Yes… And? I don’t follow..”

“Right, good point. I’ll see what I can do about it, but you have to submit a ticket for it first”.

About an hour later, I go check up on my server room buddies. I find The Intern intently watching Steve munch on a sandwich. I give him a disapproving look and tell Steve he can it in the lunch area because we do not want food in the server room. He is apologetic:

“Sorry about that. I just got hungry, and with my low blood sugar… You know how it is. I’m still on hold, and they can pick up any time. I figured there is no harm in a little snack…”

To emphasize his point, Steve emphatically waves the sandwich around as he talks. On one of the swings a tomato slice gets dislodged and soars through the sky, hitting the opposite wall with a loud smack. In astonishment Steve slightly releases his grip, and a slice of ham, and some lettuce slither out from between the bread and land on his laptop keyboard. He grabs them, stuffs them in his mouth and then shakes the laptop off sending crumbles, lettuce shreds and other unidentified bits of food on the floor.

I ask The Intern to clean it up before anyone notices, and sternly march Steve to the lunch room, trying to decide whether I should kill Steve or myself first.

Next time, more fun with Steve, and hopefully the climactic resolution. Well, maybe.

The Firewall Saga
<< Prev Next >>
]]>
http://www.terminally-incoherent.com/blog/2011/09/12/firewall-saga-part-5/feed/ 6
The Firewall Saga: Part 4 http://www.terminally-incoherent.com/blog/2011/08/29/the-firewall-saga-part-4/ http://www.terminally-incoherent.com/blog/2011/08/29/the-firewall-saga-part-4/#comments Mon, 29 Aug 2011 14:16:36 +0000 http://www.terminally-incoherent.com/blog/?p=9898 Continue reading ]]> The saga continues. If you haven’t been following this series, you can catch up to speed here. What follows might be funnier that way.

It is the day after the Beef Instrumentality Incident #631. We are finally chugging along on all cylinders, the users are mostly placated and I finally have some time to call Verizon about my on non-routing IP address. I’m not empty handed either. Barry was kind enough to arm me with traceroute logs for all our IP’s captured from two different outside locations. Which is not much, but they show that the packets sent to that one problem address route fine until they hit the Verizon network. And then, boom, they get shunted into the depths of cybervoid instead of being safely delivered into our office.

Barry also dug out, and sent me his impressively anal retentive case notes from a few years ago, when the same thing happened. These things are dense with tiresome detail: actual dates of calls he made, names of support drones spoke with, etc. According to the notes it only took 5 days and about a dozen phone calls to get the IP routable again. I figure I can do it in half of that time considering the extensive documentation I am armed with. Hell, maybe I can even do it in a shingle phone call.

Around 10 am I let my coworkers know I will be calling Verizon, and that I probably will be on hold till closing time. I ask them to drag me out of there after 5pm, bid them farewell, and gather some necessities. Two cups of coffee, bottled water and snacks in case I am trapped by the phone for weeks. I do some mental preparation, then dial the number. I go through the byzantine labyrinth of voice menus and finally end up in the wait queue. I put the phone on speaker and proceed to do some busy work while listening to their horrible on-hold music.

20 minutes into the call, a coworker from the next cubicle over starts parroting the looped voice assuring me that “Your call is very important to us. Please hold for the next available representative.” After the third or fourth time, I join in and we both say it together.

45 minutes into the call, every single person in the IT cave in on the joke. Every time that looped sound byte repeats, five voices rise up in unison. We are like a group of monks chanting an ancient prayer.

50 minutes into the call, someone decides to hit the Staples “That was Easy” button every time we chant our little chant.

An hour into the call, The Intern manages to perfectly imitate the elevator music with his mouth. By this time, my phone speaker volume is cranked all the way up and it’s a regular sing-along party.

At 65 minutes, someone wanders into the IT bunker with a question, hears us chanting, says “You guys are a bunch of nerds” and leaves. We decide we must do this more often. Anything that keeps users from dropping by with non-essential, non-work-related questions is worth working into our daily routine. Oh, and in case you were wondering that user just had a question about home theater sound systems. Because, you know – system administrators and programmers know all about home multimedia setups.

Finally, after an hour and 20 minutes on hold someone picks up. There is some booing in the background as I disengage the speaker. Apparently everyone was having fun.

I jump through all the requite identification hoops. Then I launch into a 15 minute detailed explanation about our routing issue. I explain to him the dozen or so local and remote tests we performed to verify this is not a local configuration problem. I offer to send him the traceroute logs so that he can see the problem happens only for a single IP address. I also tell him that this seems to be a recurring problem, and give him a quick rundown of Barry’s case history. The guy on the other end patiently listens to all of this, and once I’m done goes:

“Thank you for that information sir. It appears this is a router configuration issue. What we will need to do is to power-cycle the router. It is the little black box with the antenna that we have provided you when we set up your internet. What I want you to do is to unplug it from power, wait 60 seconds and then plug it back in…”

Granted, I sort of expected this to happen. I patiently explain that we do not use their cheap, off-the-shelf appliance with crippled custom Verizon software. I also reiterate that this is a routing issue, not a local configuration error. I ask him to escalate this call to the team that handles network problems, and point to the case history to support my claims. I even give him the exact date when the previous ticket was escalated to that department (thanks to Barry and his disturbingly obsessive note taking).

“I’m sorry sir, but I cannot help you if you are not using the ActionTech router we have provided you. I will need you to unplug your current router, and replace it with the ActionTech before we can continue this troubleshooting.”

Suddenly I realize I might have dialed residential support line. I ask the guy to verify, but he claims I actually called the right number. He is a proud member of the Small and Medium Business department. Surprisingly to everyone, including myself this sets me off on a weird Socratic Method rant.

I tell him that we have this many machines, this many servers and this many persistent VPN tunnels that connect us to other offices and data centers. Then I ask him whether he would classify this as a small or a medium business. Not knowing where the hell am I going with this he agrees that we are probably in a “medium” class.

Next I ask him whether or not the $20 off-the shelf, router they gave us can handle maintaining several persistent VPN tunnels? Does it have a commercial grade firewall software that would allow us to do real time packet inspection and intrusion detection? I ask him if that router has any of the features that we get audited for, and that we are contractually obliged to have in place to protect our client data?

He hesitantly agrees that it probably does not have all these features. I have a hunch he doesn’t know I’m bluffing and that we never, ever get security audits (except for internal ones). Still he insists that I temporarily connect the ActionTech just for the sake of troubleshooting.

I ask him whether he would classify the ActionTech router as an enterprise level device for medium business users, or a personal use appliance for residential clients?

He agrees it is more on the residential side.

For my cup the grace I go:

“Ok, so let me get this straight. We are paying for business class FiOS connection, and business class support. Why then are you reading troubleshooting steps from the residential support checklist, asking me to dismantle my entire network architecture and connect via an a off-the-shelf, residential device?”

Then I once again plead with him to escalate this to network support team, or to connect me to his manager. He mumbles a bit confused, then asks to put me on hold while he consults with his supervisor. A coworker from the next cubicle chimes in:

“You tell them, dude! Gotta be stern with these assholes.”

I interpret this as a compliment, seeing how I am usually not the most assertive person. In fact I sort of feel an inkling of pride for coming up with that question and answer session. Then the phone suddenly goes to dial-tone.

I have just rolled like a natural 20 on my persuasion check. I have had this guy on the ropes! He was about to do what I asked him to do! And the motherfucker hangs up on me! To make matters worse, now I have to call them again, and repeat the exact same exercise with another asshole who will ask me questions about my ActionTech router.

Boiling… Murderous… Rage…

If I was Bruce Banner, I would probably be rampaging green skinned monster in ripped up purple pants by now. Fortunately I never participated in any gamma radiation experiments so I merely let out an agonizing groan, slam my phone down real hard and decide to take an early lunch. I nearly collide with The Intern who fetched himself a fresh cup of java. Fight or flight reaction kicks in, and sends him spilling half of the mug as he is frantically scampering out of the path of my angry walk.

I return sometime later, with a clear head and full stomach. I’m determined to get this call done today, so I sit down and repeat the entire procedure. This time I don’t put it on speaker because I don’t think anyone else is in the mood to sing along. At least I know I’m not.

I spend close to an hour on hold, but fortunately this time I get someone whose IQ does not seem to be a single digit number. He still tries to make me fetch the ActionTech router, but I eventually drops that troubleshooting path. Instead, he decides to follow a different branch on his troubleshooting decision tree.

“Sir, because of your unique network configuration I’m afraid I will need to send an on-site technician to perform some local tests.”

God, damn it! No! This is a configuration issue on your end. All I need is five minutes of time of one of your network engineers. There is one entry in your routing tables that got fucked. I just need someone to go and un-fuck it. Just let me speak to someone who knows what a “routing issue” is. Please! It happened before. Look in your case notes. It should have all the information. My notes say we spoke to some dude named Richard. He fixed it last time. Can you please get me that guy!

He ignores my pleas, insists on sending out a guy. We go back and forward like this for about 15 minutes, an I eventually manage to twist his arm into escalating the issue somewhere higher. In fact, he agrees to forward my traceroute logs to the Tier 2 team for reference.

“Ok sir, you can send it to my email. It is V as in Victory, Z as in Zebra 123456789995-0 at hotmail.com”

Hotmail?

“Is that your personal email? Don’t you have like a verizon email account?” I inquire out of sheer curiosity.

“I don’t know. This is what they set up for me and told me to use sir…”

Few more innocent probing questions reveal that my friend on the other end of the line doesn’t even work for Verizon. He works for an outsourcing company. They are not directly affiliated with Verizon – they are just hired by it to act as a storm shield against the wrath of dissatisfied customers. So of course they don’t get to have verizon email accounts. Providing legit emails to folks who handle their front-line customer support is apparently not important to Verizon. They are perfectly fine with the legion of support drones sharing a few dozen hotmail accounts, and looking very, very unprofessional.

I get him the logs, I grab a case number and finally hang up. It’s almost 4pm. I have wasted almost an entire work day trying to get a single stupid issue logged in the Verizon system and escalated to proper department. Still, I feel like I have accomplished something. The case is being sent to the second tier, so perhaps someone who actually works for Verizon will get a chance to look at it. I might have wasted way to many hours on this but I feel like I have a realistic chance at beating Barry’s week-long turnaround for this issue.

Next day, I spend entire morning catching up on work I didn’t have a chance to do while fucking around with outsourced Verizon support drones. Around lunch time, I get a phone call from the front desk. Apparently some “Verizon Guy” showed up, searching for Luke.

It turns out that Verizon sent out a field technician to our office anyway. They said they wouldn’t. They said the issue was being escalated to Tier 2, but apparently that is not what happened. Right now there is a guy in our lobby and there is absolutely nothing he can do to fix the issue. But I figure that maybe I can explain the problem to him and get him to forward that information up the stream. Hell, maybe he can plug himself in on our network, run whatever diagnostics he needs to rule out a local configuration issue being the cause of our problem.

I go and fetch hom, bring him back to the server room, show him where the FiOS box is, and how it connects to the firewall. I ask him what tests does he need to do, and make sure he knows we can’t bring anything down during business hours. The guy looks at the server rack, the tangle of network cables going in and out of various switches, all wide eyed and slack jawed. He goes:

“Dude… I just thought you are going to have a bad connection, or maybe a broken router or something… This…” he gestures at the server rack housing the firewall “This is way out of my league, man.”

Apparently no one even told him what the problem was. The dispatch just said the client was experiencing connection issues. There was nothing there about routing problems. And even if there was, this guy was not trained to troubleshoot issues like that. He was armed with a spare modem and a wire crimper, and trained to jiggle cables and power cycle basic network appliances. But he seems like a nice guy, so we chat for a bit, and laugh at Verizon’s lack of competence. He says he will talk to his supervisor and see if he can pass the message along up the chain of command.

He leaves, and I clear the rest of my schedule for another grueling Verizon support call. Also I contemplate committing a ritual suicide.

Next time on Firewall Saga: Verizon sends out an “Network Specialist” to our location. Hilarity ensues.

The Firewall Saga
<< Prev Next >>
]]>
http://www.terminally-incoherent.com/blog/2011/08/29/the-firewall-saga-part-4/feed/ 7
The Firewall Saga: Part 3 http://www.terminally-incoherent.com/blog/2011/08/22/the-firewall-saga-part-3/ http://www.terminally-incoherent.com/blog/2011/08/22/the-firewall-saga-part-3/#comments Mon, 22 Aug 2011 14:03:24 +0000 http://www.terminally-incoherent.com/blog/?p=9818 Continue reading ]]> If you haven’t been following this series of posts, please familiarize yourself with the previous entries. Things will actually make more sense this way. Or you can just jump in.

Last time when we left off, my new friends Toby and Barry have finally managed to fix our firewall issue by replacing it with a brand new one, and accidentally destroying the old one. Unfortunately for me, all the rebooting caused my ISP to freak out and stop routing packets to one of the 5 consecutive public IP addresses we have been using. Barry and Toby packed their stuff, and cheerfully fucked off back to their office, while I was left with the task of yelling at Verizon until they fix the routing issue. Foolishly I thought it would be a relatively easy task.

It is the day after the big upgrade. I drag my sleep deprived carcass into the office, drop my bag at my desk and make a bee line for the coffee machine. Apparently my peripheral vision is not the best when I’m still half asleep because someone intercepts me before I manage to reach the blessed caffeine dispenser.

“Luke, the web timesheet sucks!”

Of course it sucks. So does the phone system. So does the online reviewing system. So does Microsoft office. Every single piece of software ever made sucks at one point or another. Every single software stack ever created is basically an unimaginably complex house of cards. And the fact hat it does not crash and corrupt data on regular basis, is a small miracle to be cherished. That’s just what software is: a bottomless vortex of suck. But despite being horrible and broken most of the time, it does make our lives better. And if you don’t buy that, then I dare you to live for a week using nothing but pen, paper and post office stamps.

But of course software sucking is automatically my fault, because my cubicle happens to be located in the area clearly labeled “IT Department”. Yes kids, when you take up a job in the IT field it is basically like signing a document that says “I hereby accept the blame for any and all software and hardware failures that may have happened to any electronic device with an asset tag – from now, till the heat death of the universe”. And no, I’m not kidding – you may leave the company, but they users will remember that your ass used to maintain that one server back in the day. And if, God forbid, that server is running 1% slower one day due to network congestion or some runaway OS level process, they will hold you personally responsible for their lost productivity.

The “web time sheet” that I somehow caused to suck this morning is a proprietary windows service with more memory leaks than Titanic after it’s fateful encounter with the iceberg. If you let it run unattended (like you are supposed to) then it will eventually run itself into the ground. So the server is scheduled to reboot itself automatically at like 4AM on Sundays or something like that. Unfortunately certain activities leak more memory than others, so you never really know when that server is going to go into swap-hell. Sometimes a mid week reboot is necessary.

Since it is physically on a rack in a data center in another building we have a VPN tunnel that lets us remote desktop into it for maintenance purposes. Usually rebooting it ahead of schedule is trivial, but this morning I hit a snag. The VPN tunnel seems to be down. Coincidentally, this is the one thing that we completely forgot to test the night before. The users can still access it directly because it has a web facing front end, but I can’t get to it to power it down.

So I call up Barry, but I’m going straight to voice mail. Since it is only a little past nine, I figure he is probably not in yet. So I leave a message, and on a lark call the main number for his office. I figure they may snag him as he walks in, and get him to call me back. Some perky young girl picks up, takes my name, number and promises to fetch Barry for me. Then she says:

“Please hold…” and after a very brief pause I can hear her yell out “Hey, Barry some guy Luke wants to talk to you about a fire thingy… He says it’s urgent”.

To which Barry responds: “Tell him I’m in a meeting.”

She dutifully explains that he is currently in a closed meeting and cannot answer the phone, but will call me back. I ask her to notify him that I can hear his voice quite clearly and suggest that she pencils Barry in in for another a high priority meeting sometime this morning: one between our mutual boss’ boot and Barry’s ass, if he does not give me a call as soon as his “meeting is over”.

Barry groans painfully in the background, blurts something out about learning to use the “hold” button and takes my call. I make him log in, and check the VPN related rules, but these seem to be in place. It’s just that the link is dead, and the issue seems to be on the other end. Which kinda makes sense – we essentially rebuilt the entire thing from scratch, so perhaps the other firewall got confused. So I leave Barry alone, since he and his team do not have jurisdiction over the data center in question.

Instead I call Charlie (no, not this Charlie – a different one). Of course, Charlie is working in the field that day, so instead I get to talk to one of his under-flunkies whose name I didn’t even bother to memorize, so I will just call him Beef. After a short conversation, in which Beef uses the word “brah” at least 8 times I realize I really, really, really don’t want him to touch the firewall rules. Instead I figure I will just have him reboot the web timesheet server, and then wait for Charlie to come back and troubleshoot the VPN issues with him.

When I explain his task to him, he sounds relieved and happy he won’t have to reconfigure anything. Rebooting seems straightforward enough. I give him the name, number, asset tag and all the other stuff and send him off. There is little to no cell phone reception in the heart of the data center where our machine is located, and they conveniently do not have cordless phones in their office so I can’t walk him through the entire process – which concerns me a bit. Then again, it should not be that difficult to just reboot a single machine.

About 15 minutes later, Beef calls me back:

“Bro… Sorry to call you, but like dude, do you know which rack it is in?”

So apparently Beef wandered out into the racks armed with only a sticky note with scribbled asset tag and serial number. He checked like 3 different racks, then realized there are like dozens more out there, got discouraged and called me back. I personally never visited that particular server, so I honestly have no clue where it lives physically. But I assume they have some sort of lockup system, or reference sheet which he can use to map the information I gave him to a rack number.

Some time passes, and Beef happily reports that he has found the correct rack but insists that none of the servers inside are labeled properly, and he wants to know which box is mine, counting from the top. That’s the sort of information I never actually needed before, so I’m no help. But since I’m fairly sure he is just too dense to figure it out, I suggest that maybe the servers are just racked in the order they are listed on his reference sheet. This seems to ring a bell, and he goes off to investigate.

Finally Beef calls me with good news:

“Bro, I couldn’t figure out which one it was, so I like rebooted all the machines in that rack. You should be good now, man!”

My head hits the desk so hard that it wakes the intern up from his morning nap. In an uncoordinated fight or flight reflex he attempts to stand straight up, painfully whacking his head on the overhead cubicle cabinet in the process. The sudden awakening, combined with an unexpected cranial collision makes him flail about like a beached fish, fall of the chair and land on the floor cradling his head.

My other coworker looks up from his screen and goes:

“Dude, if you kill the intern they will never going to give us another one.”

With my forehead still pressing against the desk, I hang up the office phone, extract my cell and send Charlie a quick text message:

“I think Beef rebooted every single server in the rack. I did not tell him to do that.”

His response is swift and succinct:

“MOTHER OF FUCk1” followed by “AGHS! FuKING FICKn SHIT BEEF!”

Apparently at least one of the machines in that rack is on the “do not reboot under any circumstances ever” list. Fortunately for me, none of these other servers is mine and Beef did temporarily resolve my memory leek issue. So even though I feel bad for Charlie, I write this one up as a success.

This leaves me with the more important task, namely getting Verizon to fix their routing issue. But I will talk about that next time.

In case you were wondering, Charlie managed to fix the issue relatively quickly and the users were only slightly angry. Beef was assigned a penance of self flagellation and temporarily banned from touching the racks. Of course he wasn’t fired, because like nepothism bro. Also, my users still totally think that I come to work at night, only to throttle down all the services, and sabotage their computers so that everything is slow and buggy in the mornings. I mean, that’s the only logical explanation.

Next on Firewall Saga, I deal with the brilliant Verizon on-site technicians.

The Firewall Saga
<< Prev Next >>
]]>
http://www.terminally-incoherent.com/blog/2011/08/22/the-firewall-saga-part-3/feed/ 5
The Firewall Saga: Part 2 http://www.terminally-incoherent.com/blog/2011/08/17/the-firewall-saga-part-2/ http://www.terminally-incoherent.com/blog/2011/08/17/the-firewall-saga-part-2/#comments Wed, 17 Aug 2011 14:13:51 +0000 http://www.terminally-incoherent.com/blog/?p=9783 Continue reading ]]> As you may recall from Part 1, our Checkpoint Firewall-1 has a weird quirk. It’s like that guy from Memento – every time you reboot it, it forgets all the license keys, assumes it is being run illegaly and goes into “FUCK YOU DIRTY PIRATE SCUM!” mode cutting all network communications and allowing only direct physical console access. In the past we exhausted pretty much all the easy fixes – stuff to do with file permissions, configuration tweaks, etc. Eventually powers that be sent Barry and Toby – our networking “specialists” to definitively resolve the issue. They have decided the way to do it, was to upgrade the device to the latest and greatest version. If you have read the previous post in this series, you know that this attempt failed miserably.

A week passed, then two. Toby and Barry were back at their office, but they would tweak the firewall remotely and make me do reboot tests every evening. Nothing they did worked. On the upside, I got really good at resetting the licensing information – I could pretty much do it in my sleep. In fact, I started to have dreams in which I had to reboot the damn firewall but couldn’t.

Eventually the powers that be realized that throwing man-hours at the problem is going nowhere, so they decided to go to Plan-B, which was to throw real money at it. In fact, I was quite impressed by the act of simplistic analog-era logic: if a component A is not working, and repeated attempts to fix it have failed, then we replace the component. Funny thing is that none of us digital webheads actually even thought about that. I was still convinced it was software/configuration issue, but hey – you don’t say no to a brand spanking new server. Not to mention that with some luck (and some convincing), I could get approval to re-purpose the old hardware to replace one of the very, very ancient “beige boxes” that is still running some legacy app for us.

The plan is for Toby and Barry to set it up in their office, based on our clonzilla images, and then keep it there for a week, periodically rebooting it to see if it holds permissions. If it works, they will swing by our office one day, rack it up and we will finally be able to put this issue to sleep.

After about a week they call me up, and tell me they are ready to deploy their new baby. Apparently it is rock-solid, stable and does not lose license keys ever. We all go: “Huh… I guess it was a hardware problem then…”

Fast forward to the fateful day and Barry shows up alone, and empty handed. He shakes my hand and informs me that Toby is bringing the stuff in from the car. We chat for a while, then I hear banging on the door. I walk up, open them only to see Toby looking like a pack mule. He has a backpack, two laptop shoulder bags (one of them is Barry’s apparently), a plastic shopping bag full of some stuff, and he is dragging a huge box behind him.

I’m sort of at loss of words. We stare at each other for half a minute, and then I just blurt out:

“Dude, we have a dolly in the supply closet… I could have brought it down…”

Then I grab the other end of the box and help him to carry it into the server room. Barry graciously opens the door for us, but only after watching me try to open them with my elbow for about a minute.

First task is to remove the old firewall from the rack. We set to work, power it down, disconnect everything and then Toby goes:

“Man, I hate these mounts… They always get stuck.”

The good news is that the new server has different mounts that don’t get stuck. The bad news is that the old server currently stuck in the rack. My new friends proceed with various un-sticking rituals such as jiggling, jerking, popping, rocking the rack and etc. When the subtle ways don’t work, Barry starts to brutally wail on the stuck locking mechanism with his multi-tool, while Toby goes into the back of the rack, and starts pushing.

I watch them do this for 3 solid minutes, at which point something in the back of my head goes “Wait… This ain’t right…” I can sort of see with my mind’s eye all the ways this scenario could go wrong when suddenly there is a loud popping noise.

The server juts out of the rack missing Barry’s face by half an inch. The sliding mounts extend to their full length and then just snap off. The abrupt jerk caused by the sudden disengagement causes the server to tumble so that now it’s face plate is pointing at the ground. The force of Toby’s push is making it fly in the direction perpendicular to the floor towards the nearby wall. Fortunately it is not particularly aerodynamic device it never gets there. It flips around to it’s back and lands with a loud, crunching noise about two feet from the rack.

No one says anything. Toby is still half-way in the rack, with his arm extended after a forceful push. Barry still has his multi-tool in the air, ready for another whack. We silently stare at what used to be a functional server, lying silently on the floor.

Finally, Toby blurts out: “It’s ok… We are replacing it anyway… Right?”

Faces and palms meet in a heart felt embrace. Then we set back to work, mounting the new device in the rack. I remind Toby that there is absolutely no pushing allowed this time around.

The machine goes up in the rack, gets booted up, and we decide to do a few shutdown tests to be absolutely sure the license key issue does not come back. For all we know, it could be some micro-climate that is specific to our server room that is making the Checkpoint devices go high wire. Surprisingly enough, there are no problems. The server starts up, detects the proper licensing and keeps on trucking.

We awkwardly high-five each other, fully realizing that we have absolutely nothing to celebrate here. After all it took weeks to get this done, and we managed to destroy a server in the process. Jose, the proud member of the janitorial staff has no clue what just went down, but he decides to pop into the wide open server room to say hello, and joyfully joins in on the high fiving. We are all relieved and glad that it is all over.

Then I decide to be Buzz Killington and insist that we make sure all the services are working properly. We quickly run through all the internal stuff, checking all network shares, printer access, remote sessions and etc. Everything seems to be working fine. Then I turn to the externally accessible stuff.

Our ISP has allocated us 5 static routable IP adresses. We currently use three of these to host internet-facing sites and services running from this very office. The firewall we just replaced is the internet gateway for all of them, so I want to be absolutely sure that Barry set the correct rules for all of these before we call it a night. Especially since there was no way for him to test these from his office.

I make Barry log in, and check the rules. It’s the right port, right IP address – it is the exact same rule we had before. We check the server in question, restart it and make sure it is accessible locally. It works just fine for us, but you can’t get to it from the internet.

After about an hour of head scratching, rule updating, rebooting and jiggling wires Barry finally comes up with an idea. He makes me hit every public facing IP with my phone one by one, while he sits and watches the firewall traffic in real time. I browse to our first site, and Barry sees packets coming in. I hit the second one, and there are more packets tricking in. I hit the third, problem site and there is nothing happening. No traffic hits the firewall. Both Barry and Toby pull out their phones and try to do the same. Firewall shows no incoming traffic. We even call in Hose and have him try to access the site on his droid phone, figuring that maybe Steve Jobs is just trolling us and making our iPhones do weird things. Still no incoming traffic.

Then it hits us – the issue is not on our end. It is a routing issue. Suddenly Barry has an epiphany:

“This happened once before!” he exclaims.

Apparently when they were setting up this office (and before we moved in from our previous location) they have encountered a similar issue. They were rebooting the firewall, and bringing stuff up and down for days and at one point the same IP address stopped being routable. Apparently it took them about a week to get Verizon to straighten it out. At the time no one really cared because the office was empty, and we were waiting for the work crews to install and wire up all the cubicles before moving in.

Fortunately, we have two extra public IP’s just laying around. We quickly swap the non-routable one, for one of the spares, change the DNS records and we are back in business. The only problem is that we are now down one IP address that we paid for. I make a note to yell at Verizon and make them fix it the next day. As if it was that easy.

Next time on Firewall Saga: Verizon becomes confused, I become suicidal while Barry and Toby have a meeting that never ends.

The Firewall Saga
<< Prev Next >>
]]>
http://www.terminally-incoherent.com/blog/2011/08/17/the-firewall-saga-part-2/feed/ 5
The Firewall Saga: Part 1 http://www.terminally-incoherent.com/blog/2011/08/15/the-firewall-saga-part-1/ http://www.terminally-incoherent.com/blog/2011/08/15/the-firewall-saga-part-1/#comments Mon, 15 Aug 2011 14:03:18 +0000 http://www.terminally-incoherent.com/blog/?p=9766 Continue reading ]]> My office is protected by Checkpoint Firewall-1. It is one of those fancy-shmancy enterprise level firewalls, that thankfully I do not need to maintain. You see, me and my brethren in suffering are the “internal affairs” team. The firewall falls under the our “department of defense” (or “the networking guise” as we call them) jurisdiction and I am absolutely A-ok with that. I used to be in charge of internet facing firewalls, and I always worried that I missed something crucial. Nowadays I can just sit back, point fingers and blame the firewall guys if something bad happens. The only downside of this is that we don’t have access to the firewall box, and the networking team is in a different building (and different town) altogether – so every little change takes about a week. We like to pretend that this is our “change of controls authorization process” (in fact, that’s I how I documented it) but in reality it is usually just them ignoring our requests until one of the managers gets annoyed and intervenes.

The unfortunate side effect of having enterprisey firewall and small-medium business everything else, is that both our ISP and our VoIP provider think I’m full of shit when they ask me what kind of “router” I use. Firewall-1 is not in their handy-dandy manuals and therefore must not exist. Usually our conversations go like this:

“So what is your router. It will be a little box that says something like Netgear or Linksys”

Checkpoint Firewall-1. It’s not a box, it is a Linux based firewall solution.

“Um… Ok… Let me see. Ok, we are looking for a router. Small box… Antennas. Can you try ti find it.

And so on. It usually takes at least 40 minutes and one or two escalations to actually establish what firewall we are using. After doing this several times usually just lie and say we have some random Linksys router, and then have them walk me through basic troubleshooting steps from the manual saying “It did not work” to everything. Believe it or not, it actually takes less time and I don’t have to listen to them making confused sounds with their mouths for 20 minutes.

We, here in the “internal affairs” team treat that piece of hardware as our personal Lord Voldemort. We pretend it does not exist, and refer to it as “that thing that shall not be named” because naming it would give it power. Keep in mind that that it never crashes, and we never, ever reboot it. Mostly, because we can’t.

Our firewall had a strange quirk that became apparent after a big power outage in the area during which the building’s backup generators gave up, and which drained all our battery backups. It was a long time ago – and the first time it went down since it was installed. Upon rebooting, it failed to find appropriate license keys and promptly cut all network traffic. This of course included the remote tunnel via which the network team used to maintain it. After some panicking, and some phone troubleshooting we got the license keys back in place and restored communications.

We should have known better, but at the time pretty much all of us went “Wow, that was a weird glitch. It probably won’t happen again.”

It did. Once, twice, three times. Network guys eventually acknowledged this was a problem, but after every single incident they were 100% sure the problem was completely resolved. I eventually figured out how to prevent it from affecting our productivity using simple visual cues and access controls.

I made a sticker that said “DO NOT REBOOT THIS MACHINE, EVER!” and placed it on the front of the Firewall box. Then I locked the rack it was in with a key, and hid the key in my drawer, after affixing it with a tag that said “DO NOT REBOOT THE TOP BOX IN THE RACK”. If anyone needed to access that rack, they had to answer one simple question: “Which box is not to be rebooted?” If the answer was anything but “the top one” the key would stay in the drawer.

Eventually, I managed to convince the powers that be that the problem persisted, and the networking team tasked with fixing it. And that’s how it all started.

One fine morning, two members of the networking contingent show up in our office. Let’s call them Toby and Barry (not their real names). Toby is the guy lugging around the heavy objects and babysitting installer progress bars, while Barry is the guy babysitting Toby and telling him which configuration options to pick. Also, their car seems to travel at relativistic speeds and experience time dilation everywhere they go. When these two are on the road, “we’ll be there in about 15 minutes” usually means three hours.

Of course no one told us they are coming. I simply get a phone call from Toby around 9am, informing me they are 15 minutes away from our office and that they will need the access to the Firewall rack. I go WTF, my boss goes WTF and we basically pass the WTF around the office until everyone is thoroughly confused. Don’t get me wrong – we are all kinda happy someone finally decided to fix this issue for good, but then again these guys don’t have the best track record. Also entire staff is getting anxious, whining that out of all days in the year, today is the day when we cannot afford to have any downtime. And tomorrow. Tomorrow is also the only day in the year when we can’t afford any downtime. And the day after tomorrow too.

So while the entire accounting staff is hyperventilating and having panic attacks I take down the “Beware of the Leopard” plaque from the front of the firewall rack, pull out the key, unlock it and then wait for our guests. Then I go to lunch, come back and wait some more.

They arrive half past noon, and head straight to the IT department. We meet and greet, and they unveil their plan: take the firewall down, grab a disk image with clonezilla (as a backup), then update it to the latest and greatest version. My boss looks skeptical, and wants to know how long is it going to take. Toby and Barry ponder this for a minute and give their best guess estimate: “twenty minutes max, maybe less”. After you adjust for dilation it does not look like something that can be managed in the middle of a work day. At least not without the accounting staging a mutiny. The verdict goes down to do it at the end of the work day.

So now we have Toby and Barry hanging out with us for the rest of the day. I find out that Toby has never seen a Michael Bay movie he did not love, and I lose a little bit more faith in humanity.

Finally it’s 5pm and we get to work. Firewall is backed up in no time and Toby pops in the update disk into the drive. He tries to mount it but fails and looks preplexed. He takes it out, puts it back in, tries again and starts scratching his head. He calls Barry over and they repeat the whole ordeal again twice. I’m watching this and begin to wonder if the networking sent us their very own Loyd Christmass and Harry Dunne.

Barry brings his laptop, pops the disk in to verify it works and has data on it. It does, and it has. So they go back to the server rack and repeat the entire procedure two more times. No dice. Then they turn to me.

“Any idea why this drive is not reading our disk?”

I walk over, take out the disk and inspect it. The little letters around the central hole spell out a word: “DVD-R”. I look over at the disk drive in the Firewall box and see another word there: “CD-ROM”. I wordlessly point these things out to my new friends. It takes them a few seconds but then it sinks in.

You want to know what is the funniest part about this scenario? The ISO they have burned on the DVD is only a bit over 600MB in size. It would fit on a regular CD-R without any problems. But Toby burned it on a DVD disk because apparently that’s what was on his desk. Also, Barry (the only one of the two who brought a laptop) did not copy the ISO to his machine.

We dispatch Toby to find an electronics store that is still open and obtain a USB DVD-ROM, while Barry and I hold down the fort and try to figure out if we can download the ISO from checkpoint website, or if we can just try to copy the data from the DVD onto a CD. Toby lucks out, and gets a drive in an electronics store just across the street and we are back in business.

The drive goes in, update starts executing and we all are relieved that we managed to snatch victory out of the jaws of defeat. Then something unexpected happens. The installer suddenly craps out and crashes in a weird way. Little googling tells us that there is actually no upgrade path from the version we have installed to the version we were trying to install. They are too far apart, and you either have to do a clean install or incremental update through the five or six versions that separate the two. Apparently my friends did not bother doing any research.

To add insult to the injury, the failed upgrade rendered the machine un-bootable so now we have to restore it from the image we took earlier in the evening. By the time we are done it’s around 9pm and we decide to call it a night, and reconvene the next day.

Fast forward a day, and at least 3 annoyed speeches about lack of competence delivered by different managerial personas. I am happily pointing fingers and deflecting blame with my “I am just babysitting them, they are the checkpoint experts” routine.

Toby and Barry show up around 6:45pm (time dilation – you have to adjust for the time dilation with these guys) and have a concrete plan. Back it up, export all the database rules, do a clean install, import the rules back in. In fact, clean install will most definitely fix the license key issue so this is a good thing.

This time around everything goes well. Toby babysits the installation, while Barry and I browse the web on our phones. Suddenly, Toby starts whimpering. Not a good sign.

We go check up on him, and it turns out that the new version of the Firewall software is not fully backwards compatible and there are some issues importing our rules. After several trials, and some online research we decide that the best thing to do is to re-create the rules from scratch. Barry steps in, and for the next hour and a half he painstakingly re-creates our setup.

Then for shits and giggles we decided to reboot the damn thing to see if it looses the license key again.

Guess what happens? Same thing as always. Firewall comes back up, looks around and goes “Fuck you guys, y’all dirty pirates!”

Toby and Barry are stumped. We all just wasted two evenings and accomplished absolutely nothing (well other than getting the firewall upgraded). They retreat to their home base to report on their critical mission failure, while I lock up the rack, and hang the “Beware of the Leopard” plaque on the front again. For me, it’s business as usual.

Next time on the Firewall Saga, Toby and Barry hatch a new plan and a piece of hardware experiences free fall for a few brief seconds. Stay tuned.

The Firewall Saga
<< Prev Next >>
]]>
http://www.terminally-incoherent.com/blog/2011/08/15/the-firewall-saga-part-1/feed/ 7