I was asked to magicalize something that that will allow for categorizing companies in our database using SIC Codes. SIC is a deprecated standard. It is actually being phased out of use in favor of NAICS codes. But the powers that be specifically wanted SIC because apparently contains 30% more of awesome. Or something like that.
In fact, it was decided that we should use this list specifically. I’m pretty sure that all these codes exist somewhere out there as a nice comma separated list or another machine readable format. In fact I looked but I actually couldn’t find anything better. I figured out that the only reliable way to deal with it’s million entries is to dump the whole thing into a database table and then query it when needed. So I created a very simple table like so:
CREATE TABLE sic ( code VARCHAR(8) NOT NULL, description VARCHAR(100) NOT NULL, PRIMARY KEY (`code`) ) ENGINE=InnoDB
Then I copied the whole list from that page and pasted it into Vim for some quick processing. I figured that this would be a a good excuse to show off the power of regular expressions and the best editor in the universe. So I will show you how I got from that messy list to a neat, set of SQL statements.
First I needed to delete all the blank lines from the file. There are many ways to do it. I did it like this:
Next, I decided to delete all lines that did not start with a number. For some reason the list has some headings that shouldn’t really be there. I got rid of them with another quick regexp:
I also removed all the indentation:
Finally I converted each line into a SQL INSERT statement in a two step process. First padded the begging of each line with appropriate SQL syntax and enclosed the actual SIC code in quote marks:
:%s/\(^[0-9]*\)\s/INSERT INTO sic (code,description) VALUES ('\1','/g
Then I added the closing quote mark, paren and semicolon at the end of each line:
That’s it. It took me only few minutes to go from a fairly messy and machine-unfriendly list, to a clean SQL file. I figured that I might as well post it here for your convenience. If you ever find yourself in need of importing all the SIC codes and their descriptions into a database, just use my SQL file:
Your browser does not support iframes.
Or better yet, just right-click and save the SIC Codes SQL file from here.
If this seems like a throw away “let me show you the boring stuff that I’ve been doing lately” post it’s because it is. This week is totally killing me. I used some of my vacation days as padding in between Christmas and New Year’s so haven’t really been working since the 24th. So this week seems to be impossibly long. I’m serious, I feel like it has been 2 months since Sunday.