Most problems you will face as an IT manager, sysadmin or even a lowly tech support drone can be resolved. If you stumble upon a hardware issue you just keep replacing parts until the system starts working again. If it is a software issue you reinstall the offending program or if all else fails, re-image the machine. Then there are problems which elude basic troubleshooting, and for which the big catch-all solutions like hardware replacements and re-imaging are just temporary fixes. If you Google these issues they will have thousands of hits, countless message board threads and blog posts devoted to them – and not a single concrete solution.
One of such unsolvable problems was the “Dreaded Blinking Dash Issue of 2007” which affected me personally, and which I covered at length on this very blog. If you don’t remember that whole debacle, let me remind it to you:
- It started in December of 2007
- I tried just about everything short of formatting the drive and reinstalling the OS and nothing worked.
- Completely rebuilding the system from scratch proved to be a temporary solution.
- Then it came back in February of 2008.
- I even took a video posting it all over the interwebs and trolling message boards for suggestions. Nothing short of complete wipe and reinstall worked.
- Then it came back again in April
I have never figured out what exactly was happening with my machine. I just replaced it with a newer gaming rig since I was already long overdue for an upgrade anyway. Was it hardware related? Was it software? I still don’t know. And to this day I always tend to hold my breath whenever my windows box is booting up. I always expect to see that blinking dash to pop up again.
I also have few such intractable, unsolvable issues that I see popping up all the time at work. Curiously enough, most of them are related to Microsoft Office. Here are two of most common ones:
Document Not Saved
I absolutely hate this one. It starts innocently enough with a user reporting difficulty in saving an excel file. It almost always ends with us replacing and/or re-imaging that users computer. It is a gross overkill, but so far this is the only solution that is guaranteed to work (that is unless you use one of the “tainted” images that are known not to fix this issue). Actually, scratch that – it is the only solution that has ever worked.
One of our clients requires us to use their custom written Excel Add-In which probably lies at the core of this issue. The Add-In is a mysterious black box written by some nameless, faceless VBA coder that we were so far unable to reach for comment. We know that he-or-she exists because every few years it gets updated… Or at least did, up until 2007 when we stopped seeing new releases and bug fixes. So for all we know the developer might have left, been fired or got hit by a bus. Which would probably explain why none of our contacts at that company knows anything about him/her.
Anyways, this Add-In causes very specific behavior on some machines but not on the others. The symptoms are usually as follows:
- The affected user can open and save the document normally for the first time.
- If the document was saved on an affected machine, closed and then reopened some time later the user will receive “Document not saved” error while trying to save.
- If the file is transferred to another unaffected machine, it can be opened and saved normally.
- If a document that was edited and saved on an unaffected machine is transferred to an affected machine it can be saved normally but only up until the document is closed. Once the document is re-opened the error returns.
- Disabling the add-in will allow the user to open and save the document normally but it disables access to certain features. Re-enabling the add-in re-introduces the problem.
It is fairly clear the add-in is at the core of the issue, but the funny thing is that only some machines are affected. We will often have two machines with identical hardware setup, created from the same image with same security suite and original configuration, out of which only one is affected. The only difference between these two being users and their daily habits, the office settings they may have changed and additional software and/or 3rd party office add-ins that may have been installed for a specific project and then removed (but not present at the time of comparison). So clearly the add-in is only halfway broken – and something on the system triggers the error. This something should be controllable and preventable. But we still don’t know what it is.
At one point we thought that perhaps it was a problem with Office configuration but Detect & Repair would not fix the issue. Neither would a complete removal and re-installation of the office suite. We even once had a guy sit there for two days tracking down and manually ripping out all the office registry entries thinking that maybe the problem causing bit of logic is in the profile-specific garbage that the office installer always leaves behind. If we have located that offending file or registry key we could then write a script that would either patch the problem, or at the very least sanitize the system before a fresh install. That project ended in a perplexing, and frustrating failure.
To this day we have no clue what is causing this issue, and the most cost effective working solution we have right now is to reformat and re-image. Which is a lot like killing a mosquito with a bazooka.
Corrupted Office Files
Another long standing issue that is basically unsolvable has to more to do with business practices and work flows than with IT stuff. Nevertheless the IT department does get blamed and called upon to fix it when it happens. This problem is corruption of office files. And I’m not talking about your average puny “single worksheet” excel files. I’m talking about 100+ page reports with embedded graphs, complex macros and excel documents. I’m talking about Excel reports that have 50-60 worksheets filled to the brim with mostly empty, but meticulously formatted tables and forms with delicate macros that break if you sneeze around them. I’m talking about documents that were created back in 93 and have been continuously updated, reformatted and etc.
These files sometimes go bad. Sometimes this happens in transit, sometimes it happens on someones computer, other times it is just a function of accumulation of errors. Let me give you an example of a typical office work flow::
- Obtain last months report from the supervisor
- Delete all the shit that is outdated or obsolete (better yet, simply hide these worksheets cause who knows if some formulas are not using them).
- Overwrite what is needed with current information. Add a graph and a picture just for good measure.
- Send it back to the supervisor
Over time the file accumulates cruft, errors and garbage over countless generations. It inherits new macros and embedded objects from other files. It gathers links to worksheets that no longer exist but no one bothers to fix. Often it will lead to the “Too many different excel formats” condition. And sometimes the file it just breaks and cannot be open.
We get large number of files that will simply crash Excel or Word upon opening, and dozens upon dozens of files that trigger excel recovery mode, and mysteriously lose all formatting, embedded objects and forms. The only workable solution in these cases seems to be open these files in OpenOffice.org (which strangely enough always works), and re-save them as Excel 95 files. Why that version? Because it does not support embedded objects and macros but preserves formatting. This of course makes both users and managers unhappy.
There is a whole cottage industry build around this particular problem. There are countless proprietary software packages out there that claim to recover corrupted Word and Excel files. Most of them however don’t do much more than the built-in excel recovery mode. We have tried many demos and we have yet to find one product that is effective, affordable and not shady at the same time.
Is this issue preventable? For the most part yes. Excel does have a templating system that allows you to set up master documents instead of continuously reusing files, accumulating WYSIWYG editor crap, links and phantom embedded objects. It’s just that no one wants to use it because it would require more work than just opening a file, and just changing of numbers and comments where needed leaving previous terms data and calculations intact where possible. Other than suggesting going the template route though, there is surprisingly little we can do. The office file corruption issues are just too predictable.
Do you have some favorite, baffling, frustrating and otherwise unsolvable problems you see quite often? Tell us about them in the comments.
MS Word lists. God, how I hate them! No matter the version, no matter how you use styles, at some point (usually close to work completion or even at supervisor review) they will go bad. I am not aware of any decent stable solution – if the doc is over 100 pages, the probability of this bug goes to .99
Sometimes it can be fixed via some magic – turning on and off specific styles at specific document points, but usually it means that you will have to go over the whole document AGAIN.
One of my tech leads (!) at a previous job created a personal calendar for just our team to use. Of course, it had to be a manually formatted shared Excel sheet in a shared location, which is a Win XP box, used by other teams as well.
Naturally, we hit the 10 connections limit pretty quick. And when the month rolled over, he tasked a team member to “cut and paste” the format for the next month. D’oh!
@ Victoria:
Yes. They always, always, always break in unexpected ways. Usually what you can do is to put the cursor on the item that is broken, backspace until it gets joined to the previous item, then hit enter once and indent as appropriate. It does not always work though – especially if something went wrong up above as well.
@ Mart:
Tell me about it. My company uses Zimbra for email. It has a very robust calendaring solution that integrates with Outlook. Do we use it? No, of course not. All scheduling is done in a single excel file that is only accessible to select chosen few individuals in the “inner sanctum”. Assignments are manually emailed to each individual at the beginning of the week. There are currently no plans to change this. :P
difficult problems need simple solutions. dont switch off your pc
My workplace loves ‘the cloud’. Wouldn’t be so bad, except that we use all sorts of different stuff in ‘the cloud’ and it’s not easy to remember which services we use…
I do have an interesting hardware/driver based bug on my laptop that can become annoying and which I have found no way of solving.
Using a HP tx2 tablet with an ExpressCard UMTS Modem it all works fine, as long I am in reach of UMTS – when it falls back to GSM every few seconds the mouse cursor fades away and can only be brought back by tapping the screen
When these things happen, they literally drive me crazy. I cannot let go until I understand what was the underlying cause.
But why not try a game of “have you tried it?” with your first mystery issue.
My suggestion is to first make sure it is absolutely independent of hardware: take two machines from the same batch, one with the problem and the other without it. Switch hard drives and try. And then, also try to re-imagine the two hard drives with each other’s image, and try again. Who knows, it could be some truly weird timing issue when writing files. Don’t ever count out hardware until you have tried everything.
Once you have established that it is purely software, setup some of your users with full incremental backups of the complete system, preferably every hour… Once the issue hits a user, walk back through the backups until it works again. Try to find out what happened at that point in time.
Coincidently, I think hourly incremental backups could be a solution to your second problem as well. When a user comes with his broken Excel sheet, pull up the last working version from backups and say “this time, start with cleaning up this thing, or the same thing will happen again”.
@ MrJones:
LOL. That’s exactly what I did – for months, until I finally got a new gaming rig together. But sometimes a random power outage or brownout would shut it down for me. Booting up from that was always scary ordeal.
@ Eric:
Wow… Modem bug-outs affecting the UI… One has to wonder how deeply do they need to dig in the OS entrails to actually make that modem work then.
@ Tino:
What do you suggest for incremental backups? Powers that be don’t trust anything cloud based so stuff like Mozy is out. Most of the affected machines are laptops with a single hd, and they don’t always have internet access either (the clients don’t always give our analysts access to their interwebs).
Is Norton Ghost still any good? I think the newer versions had a real-time backup feature. I used to use Ghost 2003 to do imaging back in the date, but since I switched over to clonezilla. Both do a full disk images, but you can’t be using the machine while it is backing up.
Luke Maciak wrote:
No idea what to use on windows. There must be some fancy commercial packages that does all you want and more, right? But perhaps you could whip something together with just an rsync binary and a bat-script in the scheduler. Not sure if security measures in modern Windows versions will get in the way.
Note that for both the backup uses we discussed here you technically do not need backup to another machine. All you actually need is backup to local storage, so you can pull out older versions of files. On the other hand, when setting up backups, it may be a good idea to to it right :)…
No idea. All I know is people who regularly used older versions of ghost to backup their own systems, but no one who used it for automatic backups.
lol windows.. Yeah Idk because closed source operations kind of disallow for community problem solving. Ironic eh?
The blinking cursor dealio probably only happened when you had removable storage inserted. Dells do this flashy dash when they’re trying boot to USB flash media that isn’t bootable. By default your Dell system is set to boot to removable over SATA. I’ve seen computers even try to boot to a USB printer with built in SD card readers, resulting in this very flashy dash scare. Unplugging the printer and Ctr+Alt+Del got everything back in order. Disabling boot to USB media is the permanent fix.
Sorry if you’ve already explored this possibility and I’m wrong/wasting your time. Also, long time reader, first time comment. I enjoy your work very much.