Lisp: Parse and Aggregate a CSV File
Yesterday I wrote about that little problem aggregating data in a big csv file. Just for shits and giggles, here is the lisp code to parse and aggregate a file like that. It assumes that we group the entries based on the first column, and aggregate the last. The file would look something like this:
something, stuff, poop, 20
something else, other stuff, pooper, 12
other stuff, stuff, poop, 55
something, abc, xx, 5
The script takes the name of the csv file to parse from the command line arguments. Here is the code:
;;; get split-sequence from http://www.cliki.net/SPLIT-SEQUENCE (load "split-sequence.lisp") (setq aggregate '()) ;; the association list to store results (with-open-file (stream (first *args*)) ;; open the file specified on cli (do ((line (read-line stream nil) ;;; initialize LINE with with read-line (read-line stream nil))) ;;; the increment step (result gets stored in LINE) ((null line)) ;;; termination condition (read-line returns NIL on EOF) ;;; split the read string on a comma using the split-sequence function (setq tmp (split-sequence:SPLIT-SEQUENCE #\, line )) (setq key (intern (first tmp))) ;; our key ;; convert the last value to an int (setq newval (parse-integer (first (last tmp)) :junk-allowed t)) ;; check if key is in the assoc list; if it's not, VAL will be NIL (setq val (cdr (assoc key aggregate))) (if val (rplacd (assoc key aggregate) (+ newval val)) ;; if exist update old entry (setq aggregate (acons key newval aggregate)) ;; else add a new entry ) ) (print aggregate) )
Please feel free to pick it apart, and make suggestions. I do not claim that this is the best way to do things - I just hacked it up for fun. I was surprised that CLISP does not have a built in function to split a string on a delimiter. It seems such a fundamental feature, and almost every modern language has one. Instead of writing my own, I actually used this one.
This is especially odd considering the very complex with-open-file function which actually opens and closes the file, and handles error checking in the background. This a very simple and elegant way of handling I/O. Makes you wonder why a split string function didn’t make the cut.
The output is not pretty but it works. It would take few more lines to format it the right way for import into excel but I got bored after that. SQL still wins on simplicity, and I’m pretty sure I could produce tighter and less verbose code in perl or python. But hey - I set out to make my life difficult by using lisp, and I did. This one actually made me hit the online manuals and documentation a bit to figure out how to do simple things the lisp way.
Related Posts:
December 22nd, 2007 at 5:43 pm (7446) [Quote]
Hey thanks this is something I can actually use
And btw thanks also for the latex and e-macs stuff ya been posting lately.
A couple of months ago i down loaded and installed clisps (and slime) myself promising myself I would actually learn it and use it sometime in the winter because i don’t work much or even at all then. To actually and seriously program I need time off work.
I Will look at your code tomorrow if I don’t work (I never know when I’m going ta work or not gonna work, I just learn to deal with).
String manipulation is something I usually use alot in my code and I’m pretty good at it in langs like C so to use lisps effectively I need to get a grip on it. So your code will at least motivate me to think about it so thanks
Posted usingDecember 23rd, 2007 at 5:37 am (7447) [Quote]
Thanks! I’m glad someone is reading that stuff.
My biggest issue with lisp is that the easily googleable documentation is usually fragmentary or confusing. There some good books out there, but they do not always deal with the problem at hand.
Posted usingDecember 23rd, 2007 at 5:07 pm (7449) [Quote]
Using LET* would let you save lots of setq.
I’m glad you’re willing to think in lisp, so here’s my little tip:
http://cl-cookbook.sourceforge.net/
I suggest you the mentioned book “common lisp: a gentle introduction to symbolic computation”, because the real power of lisp is that it’s a FUNCTIONAL programming language. The code you did seemed ugly because you’re still thinking imperative - you used lots of setq because you wanted to manipulate variables. But it’s a good start.
I post you a function I was working on few days ago. I am coding genetic algorithms in lisp, and here’s the definition for “mutation” (no matter what it means):
For every element in a population, for every bit of a “binary string”-individual, there’s an as little as will Pm probability to mutate the bit instead of keeping it.
The imperative pseudocode would involve two for cycles, bounded by two variables, and arrays and memory allocation etc etc. Functional programming works differently. Here’s my one-line implementation:
(defun mutate (pop pm) (mapcar #’(lambda (ind) (mapcar #’(lambda (bit) (if (
Posted usingDecember 23rd, 2007 at 5:07 pm (7450) [Quote]
that can be formatted as
December 23rd, 2007 at 5:10 pm (7451) [Quote]
(sorry for the ugly commenting, but I’m experiencing lots of problems. Please edit it at will)
that can be formatted as
December 23rd, 2007 at 5:12 pm (7452) [Quote]
(the blog doesn’t seem to accept the LESS THAN symbol… another entry…)
(sorry for the ugly commenting, but I’m experiencing lots of problems. Please edit it at will)
that can be formatted as
(defun mutate (pop pm)(sorry for the ugly commenting, but I’m experiencing lots of problems. Please edit it at will)
that can be formatted as
December 23rd, 2007 at 5:13 pm (7453) [Quote]
(wordpress: I hate you. I give up.)
I hope my message looks humble enough as I wish: I’m no genius in lisp, but lisp, even if equipotential, is very more expressive than lots of more common languages.
Best regards
-J
Posted usingDecember 23rd, 2007 at 5:15 pm (7454) [Quote]
Thanks for the tips Jabba. You are right, I should have used a let statement. And it’s true - this is still an imperative code, but I was having a hard time figuring out how to do the file I/O without a loop.
On, and btw, try putting your code in
<pre lang=”lisp”> </pre>
This will preserve whitespace and escape special characters.
Posted usingDecember 23rd, 2007 at 5:25 pm (7455) [Quote]
The less than character gets eaten by wordpress because it is a start of a HTML tag. The <pre lang=”lisp”> </pre> environment however is handled by my code highlighting plugin which will automatically escape the < character by converting it into < so you can just copy and paste code directly into the commend box.
Posted usingDecember 23rd, 2007 at 5:40 pm (7456) [Quote]
You’re too polite not to try it immediatly. I’m pasting a lisp implementation of a simple local search algorithm, that tries to find a number with as many “ones” as possible in his binary representation, with solution space 0 to n. Sorry for the documentation in Italian
PSEUDO-CODE:
WORKING LISP IMPLEMENTATION:
with much care on training on the “tail recursion” technic, that can generate code as fast as fortran does (look at “performance” on CLISP homepage)
Hope this can help, I spent some months to gain functional mentality - and it’s still work in progress.
Have fun with lisp
-J
Posted usingDecember 23rd, 2007 at 5:42 pm (7457) [Quote]
You’re too polite not to try it immediatly. I’m pasting a lisp implementation of a simple local search algorithm, that tries to find a number with as many “ones” as possible in his binary representation, with solution space 0 to n. Sorry for the documentation in Italian
PSEUDO-CODE:
Hope this can help, I spent some months to gain functional mentality - and it’s still work in progress.
Have fun with lisp
-J
Posted usingDecember 23rd, 2007 at 5:48 pm (7458) [Quote]
(sorry, never posted before on wordpress, I’m seeing only now the “preview” button… this will never occour again, please edit my previous entry, thank you)
You’re too polite not to try it immediatly. I’m pasting a lisp implementation of a simple local search algorithm, that tries to find a number with as many “ones” as possible in his binary representation, with solution space 0 to n. Sorry for the documentation in Italian
PSEUDO-CODE:
with much care on training on the “tail recursion” technic, that can generate code as fast as fortran does (look at “performance” on CLISP homepage)
Hope this can help, I spent some months to gain functional mentality - and it’s still work in progress.
Have fun with lisp
-J
Posted usingDecember 23rd, 2007 at 10:56 pm (7459) [Quote]
I fixed your entries. You missed a </pre> tag here and there. And yeah, the preview button is there for times like this. But no worries.
Thanks for the code samples! I wish the comments were in English
Posted usingDecember 23rd, 2007 at 11:14 pm (7460) [Quote]
Here is a cleaner version with let… Still trying to figure out how to make it more lispy:
December 24th, 2007 at 3:46 am (7461) [Quote]
I still haven’t tried your code (I’m packing for holydays, I’ve a fly for this evening!), but as requested I tried to make it “a little lispier” without changing what you did. It’s just a taste matter, in this case, but the result is really good. Some problems, you simply have to resolve in simil-imperative way. But one thing I really love of lisp is that you CAN do anything, even imperative, you just don’t HAVE to. Ever heard of CLOS, the Common Lisp Object System?

have fun!
http://cl-cookbook.sourceforge.net/clos-tutorial/index.html
Yes, in lisp you CAN go object oriented (and LOOK AT THE TUTORIAL, it’s soooo simple…), but you don’t have to. Usually, you just need some struct, that comes with free get/set functions
Just try it, if you wish, and let me know
-J
ps I code under linux with clisp (sometimes cmucl), simply using kwrite as editor (no time to learn emacs still… and gedit doesn’t have lisp highlighting). Under windows, the best editor i ever used is NOTEPAD++ (google it, you won’t regret it)
Posted usingDecember 24th, 2007 at 3:56 am (7462) [Quote]
Ah! Let* - that’s what I was looking for. Funny thing - I completely forgot that it exists. Thanks!
Re: editor - Vim all the way! I run it as my default text editor in every single OS.
Posted usingDecember 24th, 2007 at 4:52 am (7463) [Quote]
:D intresting, I’ll try vim so, thanks!
Posted usingJune 8th, 2008 at 5:22 am (9280) [Quote]
[…] December 22nd 2007 4:58pm [-] From: terminally-incoherent.com Related? […]
Posted using