Yesterday I wrote about that little problem aggregating data in a big csv file. Just for shits and giggles, here is the lisp code to parse and aggregate a file like that. It assumes that we group the entries based on the first column, and aggregate the last. The file would look something like this:
something, stuff, poop, 20
something else, other stuff, pooper, 12
other stuff, stuff, poop, 55
something, abc, xx, 5
The script takes the name of the csv file to parse from the command line arguments. Here is the code:
;;; get split-sequence from http://www.cliki.net/SPLIT-SEQUENCE (load "split-sequence.lisp") (setq aggregate '()) ;; the association list to store results (with-open-file (stream (first *args*)) ;; open the file specified on cli (do ((line (read-line stream nil) ;;; initialize LINE with with read-line (read-line stream nil))) ;;; the increment step (result gets stored in LINE) ((null line)) ;;; termination condition (read-line returns NIL on EOF) ;;; split the read string on a comma using the split-sequence function (setq tmp (split-sequence:SPLIT-SEQUENCE #\, line )) (setq key (intern (first tmp))) ;; our key ;; convert the last value to an int (setq newval (parse-integer (first (last tmp)) :junk-allowed t)) ;; check if key is in the assoc list; if it's not, VAL will be NIL (setq val (cdr (assoc key aggregate))) (if val (rplacd (assoc key aggregate) (+ newval val)) ;; if exist update old entry (setq aggregate (acons key newval aggregate)) ;; else add a new entry ) ) (print aggregate) )
Please feel free to pick it apart, and make suggestions. I do not claim that this is the best way to do things – I just hacked it up for fun. I was surprised that CLISP does not have a built in function to split a string on a delimiter. It seems such a fundamental feature, and almost every modern language has one. Instead of writing my own, I actually used this one.
This is especially odd considering the very complex with-open-file function which actually opens and closes the file, and handles error checking in the background. This a very simple and elegant way of handling I/O. Makes you wonder why a split string function didn’t make the cut. :P
The output is not pretty but it works. It would take few more lines to format it the right way for import into excel but I got bored after that. SQL still wins on simplicity, and I’m pretty sure I could produce tighter and less verbose code in perl or python. But hey – I set out to make my life difficult by using lisp, and I did. This one actually made me hit the online manuals and documentation a bit to figure out how to do simple things the lisp way.
[tags]lisp, csv, comma separated values, lisp file operations[/tags]