Archive for the 'hax' Category

Snurching Slashdot Futurama Quotes

Thursday, May 25th, 2006

If you are a slashdot reader, you probably know that the site sends out silly Futurama quotes with it’s headers. They are contained in custom fields like X-Fry, X-Bender, X-Leela and etc… Here is an example of a typical slashdot HTML header:

HTTP/1.1 200 OK
Date: Thu, 25 May 2006 16:35:56 GMT
Server: Apache/1.3.33 (Unix) mod_gzip/1.3.26.1a mod_perl/1.29
SLASH_LOG_DATA: shtml
X-Powered-By: Slash 2.005000113
X-Bender: We're both expressible as the sum of two cubes!

No, I did not discover this myself and no, it’s not new. Google for it, and see how many hits you get. I just figured that I share the script I use to snurch the quote from that header. I have my Kmail set up so that it uses this to append a random quote at the end of each of may emails:

lynx -mime_header http://slashdot.org/ | head -n6 | tail -1 | cut -f2 -d-

I have also seen this done using sed regexps, and perl but eh… This is easier.

For the windows folk who do not speak unix, here is an English translation: grab the 6th line of the slashdot HTML header, and display everything after the first ‘-’ (the one that appears in X-Bender and etc..)

Text Dumping PDF files

Saturday, April 22nd, 2006

The other day I got a request to convert a PDF file into a text file or something that could be imported to Excel. The was essentially some big accounting mumbo-jumbo full of numbers arranged in columns with fancy headings. There were over 200 pages of it.

Now the easiest thing to do was to use the Windows version of Adobe Acrobat and simply save the file as .txt. But of course, that knocked out all the white space. All the colums run into eachother and the file looked like crap. There is no way you could do anything useful with it.

Of course my linux PDF reader (acroread) did not have the “Save as Text” option, so the first place I turned to was the nifty linux app pdftotext.

pdftotext bigstupidfile.pdf

This gives you a quick text dump which is roughly equivalent to the buit in Acrobat save behavior. But fortunately pdftotext has all kindso of nifty features. If you want to preserve the whitespace and layout details you should do:

pdftotext -layout -eol dos bigstupidfile.pdf

The -eol dos bit is there to specify the end of line style. Remember, I’m on a unix box converting this file for a windows dude who will want to import this stuff to excel.

Needles to say, the trick worked perfectly. The columns were preserved and the file looked great. So whenever you need to convert some pdf data into text I highly recommend using -layout option.

Tags: , , ,