Recently I asked you guys what was the point of office suite databases. Most people seemed to agree with me that they are practically useless – merely dumbed down, pseudo databases with bloated graphical interfaces. Jake made a very valid point that you don’t really need a full blown database for something as simple as indexing your CD collection. Today I found yet another reason why these things are useful.
A coworker came to me asking for help. A client sent him a “hueg” csv file with over 200k records and he didn’t know what do do with it. If you ever used Excel you know that you can only fit about 65k records on a sheet. I hear they removed this limitation in Excel 2007 but of course you have to use the OOXML format for this. The business world is not ready to switch to a new file format yet, so even people who do have 2007 installed, save in 2003 compatibility mode, with the record number restrictions in place.
This of course was not a big issue – I simply used the unix split command to dice the csv file into few smaller ones and then imported all of them as worksheets in a single excel document. Unfortunately, poor guy needed to run some analysis on this data – he needed subtotals by customerid, by item number, date and etc. Which meant that the safest bet would be to go back to the original file and somehow get the summaries out of it.
As we were talking, I was already writing a perl script to do this in my mind, and trying to figure out if I could it in Lisp. Yeah, I figured that the best way of de-rusting my Lisp skills is to make things difficult on myself by coding various day-to-day hacks in the language. The downside of coding up a custom script to do this was that I had to go back to my desk, and actually write it leaving the poor guy hanging, waiting for the data as I was having fun haxing in lisp.
And then it struck me – all this guy needed were 2 or 3 “group by” SQL queries. Naturally, Excel doesn’t do SQL and even if it did we were still having that pesky record limit issue to deal with. But MS Access does let you use SQL, and does not have silly limitations like that.
So I fired up Access, imported the CSV file as a table, created the queries he needed in the SQL mode, and then exported the results to Excel worksheets so that he could use them. The whole operation took me no more than 5 minutes, whereas writing and testing a perl script would probably take me at least 15-20. If I used Lisp it would probably be even longer since I’m not completely fluent in it.
So there you have it – why do we need these things? Because they can be very useful data manipulation tools. Need to do some custom analysis done on your data? Forget using funky Excel formulas and other crap like that. Simply load everything into Access and run SQL queries against it.
So I retract my previous statement. Office Suite database products are useful. You just can’t think of them as database engines. They are more like functional tools with database like capabilities.
[tags]databases, office suite databases, access, ms access, microsoft, files[/tags]