Parsing Excel Files with Perl

My company likes to store tons of useful information locked away in excel files. I understand that not everyone understands how databases work, or how to use them. I have no clue how this happened but at some point Excel became the de-facto standard for your every day data storage needs – despite the fact that flat text files are often much better for this.

For example, given a simple tabulated list in a plain text file, I can grep through it, sort it, re arrange it or analyze it using a myriad of mature and time tested text parsing tools. I can also import it into just about any kind of software, and easily write scripts against it. Excel on the other hand, is much less flexible. But there is something about the neat rows and columns of a spreadsheet that draws people to it.

I much prefer to issue a quick command in bash than to open a bulky office application to get some basic info about an employee, or a client. So I decided to write a quick Perl script to parse through Excel files located on a network share (this probably is a common scenario in most offices). Surprisingly, it was very easy.

First you will need to mount the network share on your box. Next you will need the Spreadsheet::ParseExcel package from CPAN.

Here is the script I hacked up to extract data from a sheet which has a unique (searchable) identifiers (here people’s names) in column D, and relevant data in columns E, F, and H:

#!/usr/bin/perl -w
use strict;
use Spreadsheet::ParseExcel;
 
my $FILE = "/path/to/File.xls";
my $SHEETNAME = "Sheet1";
 
# the column that contains searchable key
my $KEY_COLUMN = 3;
 
my $searchstring = $ARGV[0];
 
my $excel = Spreadsheet::ParseExcel::Workbook->Parse($FILE);
my $sheet = $excel->Worksheet($SHEETNAME);
 
foreach my $row ($sheet->{MinRow} .. $sheet->{MaxRow})
{
  my $key	= $sheet->Cell($row,$KEY_COLUMN);
 
  if($key)
  {
    my $f1	= $sheet->Cell($row,4);
    my $f2	= $sheet->Cell($row,5);
    my $f3	= $sheet->Cell($row,7);
 
    if($key->Value() =~ m/$searchstring/)
    {
      print "\n\n";
      print "Key: " . $key->Value() . "\n";
      print "Field 1: " . $f1->Value() . "\n" if($f1);
      print "Field 2: " . $f2->Value() . "\n" if($f2);
      print "Field 3: " . $f3->Value() . "\n" if($f3);
      print "\n\n";
    }
  }
}

This is of course not the most efficient script since I’m looping through all the rows in the spreadsheet. Can you say O(n)? But it’s good enough for what I need it to do.

[tags]perl, Spreadsheet::ParseExclel, parsing, excel, scripting, programming[/tags]

This entry was posted in programming and tagged , . Bookmark the permalink.



4 Responses to Parsing Excel Files with Perl

  1. Pingback: University Update - Linux - Parsing Excel Files with Perl UNITED STATES

  2. Anil UNITED STATES Mozilla Firefox Windows says:

    Hi,
    Can we generate xlsx files (Excel 2007) using Perl?
    If yes, can you provide me an example and which module to use..

    Thanks,
    don

    Reply  |  Quote
  3. Luke Maciak UNITED STATES Mozilla Firefox Linux Terminalist says:

    @ Anil:

    Yes we can. Check here for more information. :)

    Good luck!

    Reply  |  Quote
  4. Anil UNITED STATES Mozilla Firefox Windows says:

    Hi Luke,
    Thanks for the wonderful link :@
    I have not found the information which I needed.
    Let me know if there is any possible way to write xlsx using Perl.

    Thanks,
    Don

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>