As you may recall from Part 1, our Checkpoint Firewall-1 has a weird quirk. It’s like that guy from Memento – every time you reboot it, it forgets all the license keys, assumes it is being run illegaly and goes into “FUCK YOU DIRTY PIRATE SCUM!” mode cutting all network communications and allowing only direct physical console access. In the past we exhausted pretty much all the easy fixes – stuff to do with file permissions, configuration tweaks, etc. Eventually powers that be sent Barry and Toby – our networking “specialists” to definitively resolve the issue. They have decided the way to do it, was to upgrade the device to the latest and greatest version. If you have read the previous post in this series, you know that this attempt failed miserably.
A week passed, then two. Toby and Barry were back at their office, but they would tweak the firewall remotely and make me do reboot tests every evening. Nothing they did worked. On the upside, I got really good at resetting the licensing information – I could pretty much do it in my sleep. In fact, I started to have dreams in which I had to reboot the damn firewall but couldn’t.
Eventually the powers that be realized that throwing man-hours at the problem is going nowhere, so they decided to go to Plan-B, which was to throw real money at it. In fact, I was quite impressed by the act of simplistic analog-era logic: if a component A is not working, and repeated attempts to fix it have failed, then we replace the component. Funny thing is that none of us digital webheads actually even thought about that. I was still convinced it was software/configuration issue, but hey – you don’t say no to a brand spanking new server. Not to mention that with some luck (and some convincing), I could get approval to re-purpose the old hardware to replace one of the very, very ancient “beige boxes” that is still running some legacy app for us.
The plan is for Toby and Barry to set it up in their office, based on our clonzilla images, and then keep it there for a week, periodically rebooting it to see if it holds permissions. If it works, they will swing by our office one day, rack it up and we will finally be able to put this issue to sleep.
After about a week they call me up, and tell me they are ready to deploy their new baby. Apparently it is rock-solid, stable and does not lose license keys ever. We all go: “Huh… I guess it was a hardware problem then…”
Fast forward to the fateful day and Barry shows up alone, and empty handed. He shakes my hand and informs me that Toby is bringing the stuff in from the car. We chat for a while, then I hear banging on the door. I walk up, open them only to see Toby looking like a pack mule. He has a backpack, two laptop shoulder bags (one of them is Barry’s apparently), a plastic shopping bag full of some stuff, and he is dragging a huge box behind him.
I’m sort of at loss of words. We stare at each other for half a minute, and then I just blurt out:
“Dude, we have a dolly in the supply closet… I could have brought it down…”
Then I grab the other end of the box and help him to carry it into the server room. Barry graciously opens the door for us, but only after watching me try to open them with my elbow for about a minute.
First task is to remove the old firewall from the rack. We set to work, power it down, disconnect everything and then Toby goes:
“Man, I hate these mounts… They always get stuck.”
The good news is that the new server has different mounts that don’t get stuck. The bad news is that the old server currently stuck in the rack. My new friends proceed with various un-sticking rituals such as jiggling, jerking, popping, rocking the rack and etc. When the subtle ways don’t work, Barry starts to brutally wail on the stuck locking mechanism with his multi-tool, while Toby goes into the back of the rack, and starts pushing.
I watch them do this for 3 solid minutes, at which point something in the back of my head goes “Wait… This ain’t right…” I can sort of see with my mind’s eye all the ways this scenario could go wrong when suddenly there is a loud popping noise.
The server juts out of the rack missing Barry’s face by half an inch. The sliding mounts extend to their full length and then just snap off. The abrupt jerk caused by the sudden disengagement causes the server to tumble so that now it’s face plate is pointing at the ground. The force of Toby’s push is making it fly in the direction perpendicular to the floor towards the nearby wall. Fortunately it is not particularly aerodynamic device it never gets there. It flips around to it’s back and lands with a loud, crunching noise about two feet from the rack.
No one says anything. Toby is still half-way in the rack, with his arm extended after a forceful push. Barry still has his multi-tool in the air, ready for another whack. We silently stare at what used to be a functional server, lying silently on the floor.
Finally, Toby blurts out: “It’s ok… We are replacing it anyway… Right?”
Faces and palms meet in a heart felt embrace. Then we set back to work, mounting the new device in the rack. I remind Toby that there is absolutely no pushing allowed this time around.
The machine goes up in the rack, gets booted up, and we decide to do a few shutdown tests to be absolutely sure the license key issue does not come back. For all we know, it could be some micro-climate that is specific to our server room that is making the Checkpoint devices go high wire. Surprisingly enough, there are no problems. The server starts up, detects the proper licensing and keeps on trucking.
We awkwardly high-five each other, fully realizing that we have absolutely nothing to celebrate here. After all it took weeks to get this done, and we managed to destroy a server in the process. Jose, the proud member of the janitorial staff has no clue what just went down, but he decides to pop into the wide open server room to say hello, and joyfully joins in on the high fiving. We are all relieved and glad that it is all over.
Then I decide to be Buzz Killington and insist that we make sure all the services are working properly. We quickly run through all the internal stuff, checking all network shares, printer access, remote sessions and etc. Everything seems to be working fine. Then I turn to the externally accessible stuff.
Our ISP has allocated us 5 static routable IP adresses. We currently use three of these to host internet-facing sites and services running from this very office. The firewall we just replaced is the internet gateway for all of them, so I want to be absolutely sure that Barry set the correct rules for all of these before we call it a night. Especially since there was no way for him to test these from his office.
I make Barry log in, and check the rules. It’s the right port, right IP address – it is the exact same rule we had before. We check the server in question, restart it and make sure it is accessible locally. It works just fine for us, but you can’t get to it from the internet.
After about an hour of head scratching, rule updating, rebooting and jiggling wires Barry finally comes up with an idea. He makes me hit every public facing IP with my phone one by one, while he sits and watches the firewall traffic in real time. I browse to our first site, and Barry sees packets coming in. I hit the second one, and there are more packets tricking in. I hit the third, problem site and there is nothing happening. No traffic hits the firewall. Both Barry and Toby pull out their phones and try to do the same. Firewall shows no incoming traffic. We even call in Hose and have him try to access the site on his droid phone, figuring that maybe Steve Jobs is just trolling us and making our iPhones do weird things. Still no incoming traffic.
Then it hits us – the issue is not on our end. It is a routing issue. Suddenly Barry has an epiphany:
“This happened once before!” he exclaims.
Apparently when they were setting up this office (and before we moved in from our previous location) they have encountered a similar issue. They were rebooting the firewall, and bringing stuff up and down for days and at one point the same IP address stopped being routable. Apparently it took them about a week to get Verizon to straighten it out. At the time no one really cared because the office was empty, and we were waiting for the work crews to install and wire up all the cubicles before moving in.
Fortunately, we have two extra public IP’s just laying around. We quickly swap the non-routable one, for one of the spares, change the DNS records and we are back in business. The only problem is that we are now down one IP address that we paid for. I make a note to yell at Verizon and make them fix it the next day. As if it was that easy.
Next time on Firewall Saga: Verizon becomes confused, I become suicidal while Barry and Toby have a meeting that never ends.
|The Firewall Saga|
|<< Prev||Next >>|