Archive for the 'perl' Category

Using CPAN version of WWW::Mechanize with ActiveState Perl on Windows

Monday, February 25th, 2008

I end up doing this each time I reinstall windows, and every time I forget how I did it, so I figured I’ll archive the process here. Perhaps it will help some of you. And I know, someone will say why don’t you use the PPM repository. Let’s just say i don’t want to. I want to grab the latest WWW::Mechanize package from CPAN and run with it.

Why am I posting it now? Because I needed to reinstall windows once again on my desktop, and now I need to get my blackboard scrips to work again.

This is really a multi step process. I’ll assume you have perl installed already. If not, you can get Activestate Perl which works pretty well on windows. Just grab the MSI package, install it and all the useful tools including perl, cpan and ppm will land in your path. From there follow these 3 easy steps:

Step 1: Get nmake

You will need nmake - the windows version of the make utility to compile most of the CPAN packages. How do you get nmake? There are several ways to do it, but probably easiest one is to grab the Microsoft version of the tool from their knowledge base. Once you download it, dump it somewhere in your path. It doesn’t really matter where it is, but I stuck it in the bin directory of my Perl install.

Step 2: Get YAML from CPAN

You will need YAML to build WWW::Mechanize. What is YAML? Sort answer is: do you care? Long answer: look it up. All you need to know is that you need it. So run cpan from your console and type in:

install YAML

This should cause some streaming text on the screen as the package is fetched and compiled. If it fails, make sure nmake is in your path, and that it is named nmake.exe and not something else.

Step 3: Install WWW::Mechanize

Final step is the easy one - just fetch and install the WWW::Mechanize package using the traditional method:

install WWW::Mechanize

Make sure you do step 1 and 2 before you try this. If you have tried this before you installed YAML the build will fail for some reason. To avoid that, just quit cpan, and run it again. This will clear the local cache and will re-fetch the package for a clean build.

So, there you have it. I know it’s a bit of a dry and uninteresting post for Monday morning. But this is more of a reminder to myself than anything else. I never remember where to grab nmake or what is that other package without which nothing ever builds on windows. Hopefully some of you will find it helpful. )

Download All Documents from Blackboard’s Digital Dropbox

Monday, February 4th, 2008

Blackboard sucks. This is the opinion shared by roughly 95% of faculty at my university. But this is what we have, and a crappy course management system is still better than no course management system. So we are stuck with it for the time being.

Most of the gripes about BB system concentrates around the Digital Dropbox feature which is both very convenient, and very annoying. It’s convenient for the students because they can easily submit their assignments from anywhere at any time. They simply log into blackboard, click on the appropriate course link, and upload the file. They don’t have to spend time printing their work, they do not need to search for instructors email, and hope the spam filter won’t snag it. They just hit a button and they are done.

It is horribly annoying because the instructor does not have any control over how the uploads are organized. When you open the instructors view of the dropbox you see a long list of download links in reverse chronological order (newest entries are on the top). There is no way to tag or categorize assignments so unless students name their files appropriately you have to open each file to figure out what it is.

But most annoying feature is probably the fact that there is no way to download all the files from the dropbox at once. You can only grab them one at a time by clicking the links. You’d think there was a button that says “download all” but there is none. I think there is silly trick some people use that deals with exporting/archiving the course but it’s not exactly what you’d want to do every week unless you have to. So I decided to create a solution that would let me easily download all the files from the dropbox at a click of a button. How? Observe:

use strict;
use WWW::Mechanize;
use MIME::Base64;
 
 
my $bb_login_url = 'http://school.blackboard.com/webapps/login/?action=relogin';
my $bb_dropbox_url = 'http://school.blackboard.com/bin/common/dropbox.pl?action=LIST&course_id=_SOME_NUMBER_1&render_type=EDITABLE';
 
my $bb_user = 'your_username';
my $bb_passwd = 'your_password';
 
# where do you want the downloaded files
my $folder = "absolute_path";
 
# log into bb
my $browser = WWW::Mechanize->new(autocheck => 1, quiet => 0);
$browser->get($bb_login_url);
$browser->form_number(1);
$browser->field('user_id', $bb_user);
$browser->field('encoded_pw', encode_base64($bb_passwd));
$browser->click();
 
# go to dropbox
$browser->get($bb_dropbox_url);
 
# grab all the links
my @l = $browser->links;
 
my $i = 1;
 
foreach (@l)
{
	# the links to files have the word uploads in the uri
	if($_->url() =~ m/uploads/)
	{	
		print $i . "--" . substr($_->url(), rindex($_->url(), '/')+1) . "\n";
		$browser->get($_->url(), ":content_file" => $folder . $i . "--" . substr($_->url(), rindex($_->url(), '/')+1));
		$i++;
	}
}

I could do this in less than 10 lines probably but we are not playing perl-golf here. You will need to hard code the appropriate URL’s. I hope you can figure out how to get them. If not, you should not be fucking around with perl. In fact, step away from the computer right now. These things can explode if you press the wrong key combination!

I guess the important bit of information here is the regex inside of the loop. The whole thing works because the URL’s for the files inside the dropbox (at least in the version of bb we use) look like this:

http://school.blackboard.com/courses/1/COURSE_ID/uploads/_SOMENUMBER_1  /somefile.doc

None of the other links on the page have the word uploads anywhere in the URL so this is how we can isolate the download links regardless of what type of files they are. The get() method of WWW::Mechanize can fetch the specified URI into a file instead of memory if you pass in “:content_file” => “desired_file_name_and_path” as the second argument.

I’m appending a number to the output file name because it is not uncommon to find 2 files with the same name (for example homework1.zip or something like that). The get method will overwrite files without warning so the sequential numbers prevent that from happening. They also help to organize files to match them up with the blackboard page. If your students do not include their name inside of the file, you can look back on the dropbox page and figure out which file was it rather easily.

To delete all the files from dropbox simply change the inner loop to this:

foreach (@l)
{
	if($_->url() =~ m/=REMOVE/)
	{	
		$browser->get(substr($_->url(), rindex($_->url(), '/')+1));
	}
}

Here we are grabbing the links that have the word REMOVE in the link. So now you can fetch the whole dropbox and delete everything from it for a clean start.

I’m not sure if anyone will find this useful, but it really makes working with BB a tiny little bit less annoying for me.

Batch Upload Images to ImageShack using Perl

Wednesday, June 27th, 2007

Someone asked about this so I felt compelled to deliver. The question was: “how to batch upload bunch of images to some free image hosting service?” Here is the answer. I picked ImageShack because you don’t need to register it, and from what I remember they had lax rules about allowed content. Anyway, here is the Perl code. You will need WWW::Mechanize from CPAN. The script takes a list of images to be uploaded as arguments:

#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
 
# suppress warnings about malformed forms
$SIG{__WARN__} = sub {} ;
 
my $url = "http://www.imageshack.us/";
 
my $mech = WWW::Mechanize->new();
 
foreach (@ARGV)
{
    $mech->get($url);
 
    $mech->form_number(2);
    $mech->field('fileupload' => $_);
    $mech->submit();
 
    # follow the link to see the image
    $mech->follow_link( text => 'Show', n => 1 );
    my @im = $mech->images();
 
    # display the URL of the uploaded image
    print $im[0]->url() . "\n";
}

To run it just do something like:

upload.pl ~/img/image1.jpg ~/img/image2.jpg

Alternatively to upload all the files in the current directory you can do:

ls | xargs | upload.pl

The output of the script are the URL’s of the uploaded images, appearing in order in which you specify them in arguments - this way you don’t loose track of your images.

One thing to watch for is the form that you specify in the line:

 $mech->form_number(2);

When I started writing this script, I was using 1 instead of 2 and it was working fine. Then I went to eat something, and when I came back, it no longer worked. Not sure what happened, but looking at the amount of javascript on imageshack website it’s possible that they sometimes move around the search box on the page - possibly to prevent exactly what I’m showing you here. )

Enjoy.

Parsing Excel Files with Perl

Wednesday, June 27th, 2007

My company likes to store tons of useful information locked away in excel files. I understand that not everyone understands how databases work, or how to use them. I have no clue how this happened but at some point Excel became the de-facto standard for your every day data storage needs - despite the fact that flat text files are often much better for this.

For example, given a simple tabulated list in a plain text file, I can grep through it, sort it, re arrange it or analyze it using a myriad of mature and time tested text parsing tools. I can also import it into just about any kind of software, and easily write scripts against it. Excel on the other hand, is much less flexible. But there is something about the neat rows and columns of a spreadsheet that draws people to it.

I much prefer to issue a quick command in bash than to open a bulky office application to get some basic info about an employee, or a client. So I decided to write a quick Perl script to parse through Excel files located on a network share (this probably is a common scenario in most offices). Surprisingly, it was very easy.

First you will need to mount the network share on your box. Next you will need the Spreadsheet::ParseExcel package from CPAN.

Here is the script I hacked up to extract data from a sheet which has a unique (searchable) identifiers (here people’s names) in column D, and relevant data in columns E, F, and H:

#!/usr/bin/perl -w
use strict;
use Spreadsheet::ParseExcel;
 
my $FILE = "/path/to/File.xls";
my $SHEETNAME = "Sheet1";
 
# the column that contains searchable key
my $KEY_COLUMN = 3;
 
my $searchstring = $ARGV[0];
 
my $excel = Spreadsheet::ParseExcel::Workbook->Parse($FILE);
my $sheet = $excel->Worksheet($SHEETNAME);
 
foreach my $row ($sheet->{MinRow} .. $sheet->{MaxRow})
{
  my $key	= $sheet->Cell($row,$KEY_COLUMN);
 
  if($key)
  {
    my $f1	= $sheet->Cell($row,4);
    my $f2	= $sheet->Cell($row,5);
    my $f3	= $sheet->Cell($row,7);
 
    if($key->Value() =~ m/$searchstring/)
    {
      print "\n\n";
      print "Key: " . $key->Value() . "\n";
      print "Field 1: " . $f1->Value() . "\n" if($f1);
      print "Field 2: " . $f2->Value() . "\n" if($f2);
      print "Field 3: " . $f3->Value() . "\n" if($f3);
      print "\n\n";
    }
  }
}

This is of course not the most efficient script since I’m looping through all the rows in the spreadsheet. Can you say O(n)? But it’s good enough for what I need it to do.