Those pesky PDF files

At some point in the past we sent out an email to the staff saying that we can assist them with file conversion services. Very often they get large data files (agings, inventory, sales receipts) in various formats. Some are plain text, some are comma/tab delimited ASCII but most are PDF. Using tools like Monarch we can relatively easily extract the data locked inside the PDF files and convert it to just about any format. The one requested most often is of course Excel.

To this day we regret not wording that email a little bit better. We still have to explain to people how this conversion process really works.

For example one guy found out that we can convert PDF files to excel from a co-worker. It just happened that he had some large, tab delimited text files that he wanted to manipulate in excel. So he came up with a brilliant plan:

He printed out what came out to be close to a 60 pages of data
He then scanned them in as PDF files
Naturally he was scanning them one page at a time since he didn’t know better
The copy machine sent the ~60 scanned PDF’s to him, one page per email
He then took the time to download all these files, save them, rename and reorder them
Finally he zipped the 60 PDF files and emailed them to me asking if I could convert them to excel

I didn’t know this story. I’m recounting it to you now because it’s silly but at the time all I knew was that I got 60 PDF files all of which essentially contained scanned images. I really didn’t know whether I should laugh or cry. I decided to call him up and find out what the deal was. When he recounted this process to me I had to take a break to bang my head against the wall for 5 minutes. Afterwards I called him back and explained to him how to open the tab delimited file in Excel and then click next 2 or 3 times and watch the built-in import feature does the magic.

Sigh… At least he didn’t photograph the pages on a wooden table.

Another lady was doing test counts one day. She painstakingly recorded the values of the counts using the old fashioned pen and paper method. Normally she would have to re-enter all this data into some XML file but fortunately she remembered that email we sent out. So she scanned in all these hand written notes into a PDF file and then sent them to me for conversion. It took me a while to explain to her that I didn’t really have tools to do this type of OCR. She just kept saying “But they are PDF files!”.

I don’t think she ever got it, but she eventually gave up trying to convince me to convert them. She probably figured I was lazy or something. :P

There seems to be something about PDF files that makes the small minded people very confused. We had another guy who kept sending word documents to the office to be “scanned in as PDF”. The secretary would then print them out, walk across the hall to the copy room, scan the printout, type in her email on the copy machine’s touch screen, go back to her desk, wait for the email, and forward it back to this guy.

They were both floored when I introduced them to PDFCreator. They absolutely loved it but it introduced a brand new problem. The Word guy would now create a document, generate a PDF and then realize he had few typos and/or mistakes in it. Fortunately he remembered we had full version of Adobe Acrobat (the one that can do touch-ups on PDF files) in the office. So he would now send us his newly converted PDF file along with the list of corrections.

I called him up and our conversation basically went like this:

Him: “Can you remove the fourth and sixth sentence in the third paragraph? This should make the whole document fit on 2 pages.”.
Me:: “Well, you see… The Adobe tool is mostly for touching up text objects. It doesn’t really re-flow the paragraphs. Btw, how do you know it will cut down the size to 2 pages?”
Him: “Oh, I did it in my word document so I can see how it will look when you do these changes.”
Me:: “Um… Wouldn’t it be easier if you just used PDFCreator to generate another PDF file out of this updated Word document?”

[long pause]

Him: “Sigh… I just thought it would be easier with the adobe thing”

I don’t know what is it about the PDF files. These folks seem to be doing ok working with Word and Excel files. But PDF files seem to have some sort of extra magical properties that induce confusion in some people.

[tags]pdf files, conversion, excel, word, wtf, wtf stories[/tags]

6 Responses to Those pesky PDF files

Miloš says:

February 8, 2008 at 12:21 pm

hahaha…reading this brought back so many “great” memories. :)

I think it might be their logo that’s mesmerizing them…the red swirly A…you must obey the logo and convert all of your files into PDFs! lol

Reply | Quote
Luke Maciak says:

February 8, 2008 at 12:40 pm

Or maybe it’s that 5 minute slpash screen which insists on telling you exactly which plugin it is loading at the time. It’s hypnotic! lol

Reply | Quote
jambarama says:

February 8, 2008 at 1:23 pm

It is funny how people view PDFs and file formats in general. In my civil procedure law class we were talking about electronic discovery and some student suggested you could hose the other party (and protect yourself) by sending PDFs because they’re unsearchable, uneditable, and unforgable. Everyone laughed & agreed.

I had a hard time not raising my hand and calling everyone an idiot.

Another time a coworker was complaining about adobe reader taking over everything (with the update manager, the slow load times, & freezing her browser). She was floored when I showed her foxit – “does adobe allow this? Are you sure this isn’t illegal?” Yikes.

Reply | Quote
Matt` says:

February 8, 2008 at 1:58 pm

All hail to Foxit, thanks to them I can finally stop worrying about every link I click turning out to be a pdf in disguise.

And yes… these stories are a tad sad… someone really needs to go around with a computer-literacy stick and whack ’em all until they bleed.

Reply | Quote
jambarama says:

February 8, 2008 at 2:03 pm

By the way, you probably saw this, but the next version of OO.org will have pdf import. In addition to pdf creation built into OO.org, the ability to edit pdf files should really give others a good reason to dump MS office. Add to that SVN, automatic wiki markup, ooxml support, enhanced latex, and extensions (they’ve already got one for google docs integration). Oh yeah, and an email/calendar support, and the calendar can interface with iCal, google calendar and much more. It looks to me that MS Office devs should start looking for new jobs :)

Not wholly related to PDFs, but interesting.

Reply | Quote
Luke Maciak says:

February 8, 2008 at 3:17 pm

[quote post=”2283″]It looks to me that MS Office devs should start looking for new jobs[/quote]

Nah, they will just release Office 2008 with “extended” OOXML support which will make documents look broken on OO.o. They are really good at this game – they have been doing this for years. ;)

Reply | Quote