Yesterday I wrote about that little problem aggregating data in a big csv file. Just for shits and giggles, here is the lisp code to parse and aggregate a file like that. It assumes that we group the entries based on the first column, and aggregate the last. The file would look something like this:
something, stuff, poop, 20
something else, other stuff, pooper, 12
other stuff, stuff, poop, 55
something, abc, xx, 5
The script takes the name of the csv file to parse from the command line arguments. Here is the code:
;;; get split-sequence from http://www.cliki.net/SPLIT-SEQUENCE
(load "split-sequence.lisp")
(setq aggregate '()) ;; the association list to store results
(with-open-file (stream (first *args*)) ;; open the file specified on cli
(do ((line (read-line stream nil) ;;; initialize LINE with with read-line
(read-line stream nil))) ;;; the increment step (result gets stored in LINE)
((null line)) ;;; termination condition (read-line returns NIL on EOF)
;;; split the read string on a comma using the split-sequence function
(setq tmp (split-sequence:SPLIT-SEQUENCE #\, line ))
(setq key (intern (first tmp))) ;; our key
;; convert the last value to an int
(setq newval (parse-integer (first (last tmp)) :junk-allowed t))
;; check if key is in the assoc list; if it's not, VAL will be NIL
(setq val (cdr (assoc key aggregate)))
(if val
(rplacd (assoc key aggregate) (+ newval val)) ;; if exist update old entry
(setq aggregate (acons key newval aggregate)) ;; else add a new entry
)
)
(print aggregate)
)
Please feel free to pick it apart, and make suggestions. I do not claim that this is the best way to do things – I just hacked it up for fun. I was surprised that CLISP does not have a built in function to split a string on a delimiter. It seems such a fundamental feature, and almost every modern language has one. Instead of writing my own, I actually used this one.
This is especially odd considering the very complex with-open-file function which actually opens and closes the file, and handles error checking in the background. This a very simple and elegant way of handling I/O. Makes you wonder why a split string function didn’t make the cut. :P
The output is not pretty but it works. It would take few more lines to format it the right way for import into excel but I got bored after that. SQL still wins on simplicity, and I’m pretty sure I could produce tighter and less verbose code in perl or python. But hey – I set out to make my life difficult by using lisp, and I did. This one actually made me hit the online manuals and documentation a bit to figure out how to do simple things the lisp way.
[tags]lisp, csv, comma separated values, lisp file operations[/tags]
Hey thanks this is something I can actually use :) And btw thanks also for the latex and e-macs stuff ya been posting lately.
A couple of months ago i down loaded and installed clisps (and slime) myself promising myself I would actually learn it and use it sometime in the winter because i don’t work much or even at all then. To actually and seriously program I need time off work.
I Will look at your code tomorrow if I don’t work (I never know when I’m going ta work or not gonna work, I just learn to deal with).
String manipulation is something I usually use alot in my code and I’m pretty good at it in langs like C so to use lisps effectively I need to get a grip on it. So your code will at least motivate me to think about it so thanks ;)
Thanks! I’m glad someone is reading that stuff.
My biggest issue with lisp is that the easily googleable documentation is usually fragmentary or confusing. There some good books out there, but they do not always deal with the problem at hand. :P
Using LET* would let you save lots of setq.
I’m glad you’re willing to think in lisp, so here’s my little tip:
http://cl-cookbook.sourceforge.net/
I suggest you the mentioned book “common lisp: a gentle introduction to symbolic computation”, because the real power of lisp is that it’s a FUNCTIONAL programming language. The code you did seemed ugly because you’re still thinking imperative – you used lots of setq because you wanted to manipulate variables. But it’s a good start.
I post you a function I was working on few days ago. I am coding genetic algorithms in lisp, and here’s the definition for “mutation” (no matter what it means):
For every element in a population, for every bit of a “binary string”-individual, there’s an as little as will Pm probability to mutate the bit instead of keeping it.
The imperative pseudocode would involve two for cycles, bounded by two variables, and arrays and memory allocation etc etc. Functional programming works differently. Here’s my one-line implementation:
(defun mutate (pop pm) (mapcar #'(lambda (ind) (mapcar #'(lambda (bit) (if (
that can be formatted as
(sorry for the ugly commenting, but I’m experiencing lots of problems. Please edit it at will)
that can be formatted as
(the blog doesn’t seem to accept the LESS THAN symbol… another entry…)
(sorry for the ugly commenting, but I’m experiencing lots of problems. Please edit it at will)
that can be formatted as
(defun mutate (pop pm)(sorry for the ugly commenting, but I’m experiencing lots of problems. Please edit it at will)
that can be formatted as
(wordpress: I hate you. I give up.)
I hope my message looks humble enough as I wish: I’m no genius in lisp, but lisp, even if equipotential, is very more expressive than lots of more common languages.
Best regards
-J
Thanks for the tips Jabba. You are right, I should have used a let statement. And it’s true – this is still an imperative code, but I was having a hard time figuring out how to do the file I/O without a loop. :P
On, and btw, try putting your code in
<pre lang=”lisp”> </pre>
This will preserve whitespace and escape special characters.
The less than character gets eaten by wordpress because it is a start of a HTML tag. The <pre lang=”lisp”> </pre> environment however is handled by my code highlighting plugin which will automatically escape the < character by converting it into < so you can just copy and paste code directly into the commend box. :)
You’re too polite not to try it immediatly. I’m pasting a lisp implementation of a simple local search algorithm, that tries to find a number with as many “ones” as possible in his binary representation, with solution space 0 to n. Sorry for the documentation in Italian :P
PSEUDO-CODE:
WORKING LISP IMPLEMENTATION:
with much care on training on the “tail recursion” technic, that can generate code as fast as fortran does (look at “performance” on CLISP homepage)
Hope this can help, I spent some months to gain functional mentality – and it’s still work in progress.
Have fun with lisp
-J
You’re too polite not to try it immediatly. I’m pasting a lisp implementation of a simple local search algorithm, that tries to find a number with as many “ones” as possible in his binary representation, with solution space 0 to n. Sorry for the documentation in Italian :P
PSEUDO-CODE:
Hope this can help, I spent some months to gain functional mentality – and it’s still work in progress.
Have fun with lisp
-J
(sorry, never posted before on wordpress, I’m seeing only now the “preview” button… this will never occour again, please edit my previous entry, thank you)
You’re too polite not to try it immediatly. I’m pasting a lisp implementation of a simple local search algorithm, that tries to find a number with as many “ones” as possible in his binary representation, with solution space 0 to n. Sorry for the documentation in Italian :P
PSEUDO-CODE:
with much care on training on the “tail recursion” technic, that can generate code as fast as fortran does (look at “performance” on CLISP homepage)
Hope this can help, I spent some months to gain functional mentality – and it’s still work in progress.
Have fun with lisp
-J
I fixed your entries. You missed a </pre> tag here and there. And yeah, the preview button is there for times like this. But no worries.
Thanks for the code samples! I wish the comments were in English :P
Here is a cleaner version with let… Still trying to figure out how to make it more lispy:
I still haven’t tried your code (I’m packing for holydays, I’ve a fly for this evening!), but as requested I tried to make it “a little lispier” without changing what you did. It’s just a taste matter, in this case, but the result is really good. Some problems, you simply have to resolve in simil-imperative way. But one thing I really love of lisp is that you CAN do anything, even imperative, you just don’t HAVE to. Ever heard of CLOS, the Common Lisp Object System? :D
http://cl-cookbook.sourceforge.net/clos-tutorial/index.html
Yes, in lisp you CAN go object oriented (and LOOK AT THE TUTORIAL, it’s soooo simple…), but you don’t have to. Usually, you just need some struct, that comes with free get/set functions :D
Just try it, if you wish, and let me know ;) have fun!
-J
ps I code under linux with clisp (sometimes cmucl), simply using kwrite as editor (no time to learn emacs still… and gedit doesn’t have lisp highlighting). Under windows, the best editor i ever used is NOTEPAD++ (google it, you won’t regret it)
Ah! Let* – that’s what I was looking for. Funny thing – I completely forgot that it exists. Thanks!
Re: editor – Vim all the way! I run it as my default text editor in every single OS. ;)
:D intresting, I’ll try vim so, thanks!
Pingback: Lisp: Parse and Aggregate a CSV File [ Terminally Incoherent ]
It’s very common for CSV files to allow the comma as part of a field… as long as the field’s quoted. For instance:
1,2,”3,4″,5.
SPLIT-SEQUENCE will, I imagine, misparse the 3rd field and say that the line contains 5 fields when it only contains 4.
Further, quoted fields can themselves contain quotes, escaped as “”. So this is a legal CSV line too:
1,2″Then I was all like “”so what?”””
And the final thing is that quoted fields may contain linebreaks. (I must say, I’ve never seen this in the field!)