Archive for June, 2007

Google Desktop for Linux

Thursday, June 28th, 2007

I got a nice surprise today. I scrolled through the Slashdot feed, and saw a story about Google finally releasing their desktop search tool on Linux. So I immediately went and downloaded a copy.

I like Google Desktop Search because it simply works, and because it integrates with Gmail. I previously tried Beagle but it just didn’t work for me. I had it installed for over a month now, but even the simplest queries returned odd results. For example, queries for the word thesis would yield no results even after a month of indexing, despite the fact that I had about 20 tex files, and a huge pile of pdf files with the word thesis right in their title. Same goes for Web Timesheet and Replicon – I have about a hundred of back and forward emails with that company regarding the timesheet software, and bunch of proposals in PDF form, and some excel spreadsheets with time-sheet in the name.

I restarted the daemon and reset the index many times – but it would never work. Google Desktop Search on the other hand – just worked. I tried above queries with the index only at 15% and I already got very relevant hits. So I’m sold. Goodbye Beagle, welcome Google Desktop.

Btw, it’s nice of Google to provide both an RPM and a Deb as installation choices. I had no issues installing the deb package on Dapper and Google claims it will work on just about any Ubuntu and Debian system. Nice. Although, I have to say this is probably the first linux application that forced me to “reboot” to complete the installation. Ok, it didn’t really tell me to reboot – it just said the search will start next time I log in. So I logged out, and logged back in and then it kicked into place. Very odd. What would they ever need to do that?

And yes, I know, I know. Google will steal my data, eat my soul, and sell my pr0n to the feds. We went over all of that. I’m not scared. I have yet to see some proof showing that Desktop search transmits private information back to Google.

I’m just glad to see that people at Google do care about Linux users, and when they promise a port, they can deliver it.

Batch Upload Images to ImageShack using Perl

Wednesday, June 27th, 2007

Someone asked about this so I felt compelled to deliver. The question was: “how to batch upload bunch of images to some free image hosting service?” Here is the answer. I picked ImageShack because you don’t need to register it, and from what I remember they had lax rules about allowed content. Anyway, here is the Perl code. You will need WWW::Mechanize from CPAN. The script takes a list of images to be uploaded as arguments:

#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
 
# suppress warnings about malformed forms
$SIG{__WARN__} = sub {} ;
 
my $url = "http://www.imageshack.us/";
 
my $mech = WWW::Mechanize->new();
 
foreach (@ARGV)
{
    $mech->get($url);
 
    $mech->form_number(2);
    $mech->field('fileupload' => $_);
    $mech->submit();
 
    # follow the link to see the image
    $mech->follow_link( text => 'Show', n => 1 );
    my @im = $mech->images();
 
    # display the URL of the uploaded image
    print $im[0]->url() . "\n";
}

To run it just do something like:

upload.pl ~/img/image1.jpg ~/img/image2.jpg

Alternatively to upload all the files in the current directory you can do:

ls | xargs | upload.pl

The output of the script are the URL’s of the uploaded images, appearing in order in which you specify them in arguments – this way you don’t loose track of your images.

One thing to watch for is the form that you specify in the line:

 $mech->form_number(2);

When I started writing this script, I was using 1 instead of 2 and it was working fine. Then I went to eat something, and when I came back, it no longer worked. Not sure what happened, but looking at the amount of javascript on imageshack website it’s possible that they sometimes move around the search box on the page – possibly to prevent exactly what I’m showing you here. :)

Enjoy.

Parsing Excel Files with Perl

Wednesday, June 27th, 2007

My company likes to store tons of useful information locked away in excel files. I understand that not everyone understands how databases work, or how to use them. I have no clue how this happened but at some point Excel became the de-facto standard for your every day data storage needs – despite the fact that flat text files are often much better for this.

For example, given a simple tabulated list in a plain text file, I can grep through it, sort it, re arrange it or analyze it using a myriad of mature and time tested text parsing tools. I can also import it into just about any kind of software, and easily write scripts against it. Excel on the other hand, is much less flexible. But there is something about the neat rows and columns of a spreadsheet that draws people to it.

I much prefer to issue a quick command in bash than to open a bulky office application to get some basic info about an employee, or a client. So I decided to write a quick Perl script to parse through Excel files located on a network share (this probably is a common scenario in most offices). Surprisingly, it was very easy.

First you will need to mount the network share on your box. Next you will need the Spreadsheet::ParseExcel package from CPAN.

Here is the script I hacked up to extract data from a sheet which has a unique (searchable) identifiers (here people’s names) in column D, and relevant data in columns E, F, and H:

#!/usr/bin/perl -w
use strict;
use Spreadsheet::ParseExcel;
 
my $FILE = "/path/to/File.xls";
my $SHEETNAME = "Sheet1";
 
# the column that contains searchable key
my $KEY_COLUMN = 3;
 
my $searchstring = $ARGV[0];
 
my $excel = Spreadsheet::ParseExcel::Workbook->Parse($FILE);
my $sheet = $excel->Worksheet($SHEETNAME);
 
foreach my $row ($sheet->{MinRow} .. $sheet->{MaxRow})
{
  my $key	= $sheet->Cell($row,$KEY_COLUMN);
 
  if($key)
  {
    my $f1	= $sheet->Cell($row,4);
    my $f2	= $sheet->Cell($row,5);
    my $f3	= $sheet->Cell($row,7);
 
    if($key->Value() =~ m/$searchstring/)
    {
      print "\n\n";
      print "Key: " . $key->Value() . "\n";
      print "Field 1: " . $f1->Value() . "\n" if($f1);
      print "Field 2: " . $f2->Value() . "\n" if($f2);
      print "Field 3: " . $f3->Value() . "\n" if($f3);
      print "\n\n";
    }
  }
}

This is of course not the most efficient script since I’m looping through all the rows in the spreadsheet. Can you say O(n)? But it’s good enough for what I need it to do.