python – Terminally Incoherent

Python: Increase Your Zen, Maximize Your Hapiness

Mon, 13 May 2013 14:01:05 +0000

The philosophy of Python can be summed up in a single line:

python -m this

When I first discovered Python it still had that ugly, pixelated green snake logo all over their website, and the documentation was all like “Monty Python guise, LOL, amirite?”. Over the years it has grown up a lot and became so mature, solid and reliable that people no longer even care about the white space thing as much. For years now Python has been ordinary and almost boring – all the rock stars, novelty chasers and cool kids left it for Ruby long time ago (and then recently left Ruby for things like Haskell and Node) and so serious shit can get done.

But despite being stable and mature environment, Python code is still a joy to work with. The beauty of the language is it’s enduring quality, baked right into the syntax and grammar of the language, and deeply rooted in the community. But as with all languages that have been around for a while, Python is not impervious to cruft, baggage and just plain poor coding. Despite the design of the language it is actually not that difficult to write ugly Python code. It is also not that hard to write pretty good code but in a way that is less than optimal.

So I wanted to share some tips, tricks and best practices for writing Python code that I have picked up from various sources over the years. This is definitely not an exhaustive or all-inclusive list. Rather it is a random assortment of suggestions and ideas that can help you structure your code better, and tools that will help you to do more with less.

Use Virtualenv

Installing packages globally is not always good idea. Especially if you use easy_install which can do funky things such as partial installs caused by network connectivity issues. It is better to keep your global install relatively clean and only install random packages on a project per project basis. Then, even if you mess up your installation you can just wipe the slate clean and start over again without much hassle.

Python doesn’t have native support of installing packages and modules this way, but you can force it to do so using the virtualenv package. The idea behind it is rather simple: you run a command line script and it copies your python environment into the current working directory, and then modifies your path for the current session so that all calls to python search the local folders first.

Here is an example how you would create a virtual environment for your project:

# Copy a clean install of python with no packages to ./Foo
# It will create ./Foo if it doesn't exist
virtualenv --no-site-packages Foo
cd Foo

# Update path for current session
source bin/activate

This gives you a clean slate instance of Python with no baggage, which you can now build up with only the packages that are required for your project. Granted, this method litters your project folder with bunch of standard library files which can be annoying.

I usually recommend using virtualenvwrapper alongside with the virtualenv package which provides a set of wrapper scripts that aim to simplify your life. Once you install it, you can define $WORKON_HOME environment variable (usually ~/.virtualenvs) which is going to be the default directory for installing virtual environments. It also simplifies the syntax a little bit, and modifies your prompt to reflect the virtualenv you are working in:

# Create virtualenv in $WORKON_HOME/Foo
mkvirtualenv Foo

# Activate virtualenv
workon Foo

The great thing about this approach is that you can have two separate environments (say ENV1 and ENV2) for the same project. For example if you want to investigate a bug that only happens when your code is running on a legacy version of Django, you can easily set it up.

Use pip instead of easy_install

One could argue this is a matter of personal preference, but I personally think that using pip should be considered one of the fundamental best practices for Python. As package managers go, it is simply superior in almost every conceivable way. It has meaningful error messages, it doesn’t leave your system in a weird state if an install fails, it can talk to source control repositories and it provides an uninstall command which lets you easily remove old or deprecated packages you no longer use.

It also allows you to create and use simple text files to track dependencies for your project. You simply create a file (let’s call it requirements.txt like this, simply listing names and versions of packages that need to be installed on the system in order for your code to work:

MyProject
Markdown==2.3.1

Save it with your code, commit it to the repository and forget about it. Later, when you need to deploy it somewhere else you can simply tell pip to fetch all the dependencies like this:

pip install -r requirements.txt

If you are using virtualenv, you can actually easily test that this works by creating an environment with the –no-site-packages switch. Or, if you already have a virtualenv custom-tailored to your project you can run the following command to auto-generate requirements file for yourself:

pip freeze > requirements.txt

Note that this typically dumps all the installed packages into the file, so if you run it globally instead of in a virtualenv you can end up with a really long list of packages that may or may not be related to your project.

Standard Libraries are not always best

Python code is beautiful, except when it isn’t. Most of the standard libraries follow the simple philosophy of Python Zen: they facilitate writing beautiful, simple, flat and sparse code. That said, the standard library has a few ugly warts: like urllib2 for example.

Take the following code snippet for example:

import urllib2

url = 'https://api.github.com'
 
req = urllib2.Request(url)
 
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, url, 'user', 'pass')
 
auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)
 
urllib2.install_opener(opener)
handler = urllib2.urlopen(req)
 
print handler.getcode()
print handler.headers.getheader('content-type')

Is this beautiful? Is it clean? To me, the entire library has a very heavy handed Java feel to it. Compare it to the same functionality implemented using the third party requests library by Kenneth Reitz:

import requests
 
r = requests.get('https://api.github.com', auth=('user', 'pass'))
 
print r.status_code
print r.headers['content-type']

I think that most of us would agree that the latter is cleaner, clearer, more readable and arguably more Pythonic than the former. While it may seem counter-intuitive at first, sometimes picking a few good external packages can make a difference between unattainable mess and a clean and beautiful code.

If at any point you find yourself writing Python code that seems ugly, dirty and cumbersome, chances are there is a better way. You just have to look around a bit.

Use Context Managers

When you are writing code, chances are that sooner or later you will run into a situation where you need to open, lock or reserve some resource and then release it afterwards. The most trivial example would probably be writing to a file:

f = open('myfile','w')
f.write('Hello world!\n')
f.close

The unfortunate truth about code like that is that most of us habitually forget about like #3. And even if we don’t forget it, an exception or an error might prevent it from being ran. This may not happen so often when doing simple I/O but the same pattern is used to connect to databases or manipulating sockets where errors and exceptions are to be expected as a fact of life, and closing and releasing resources is doubly important. You could wrap your code in a try-finally blocks, but as of Python 2.5 and higher the cleaner and simpler way of writing the above is:

with open('myfile','w') as f:
	f.write('Hello world!\n')

The cleanup is implicit, automatic and triggered whenever you leave the context of the block. This works great with objects, classes and functions that support it, but how do you bake it into your own custom code? It’s relatively simple: you put your set-up code in an __enter__ method and your tear-down code in an __exit__ method and they will be auto-magically called whenever your class is initialized using the with context manager:

# set it up
class DBConnect:
    def __enter__(self):
        # establish db connection
        return self.dbconnection

    def __exit__(self):
        self.dbconnection.close()

# use it like this
with DBConnect() as db:
    # do stuff

Defining these methods for your custom classes not only makes the code more robust, but also more beautiful, allowing you to avoid writing the unseemly try-finally blocks all over the place.

Create Objects that are Transparent

The Python REPL can be extremely useful tool. I use it all the time to quickly test ideas or poke my code around. Unfortunately it is not always easy to glimpse inside your custom classes and objects this way unless you take steps to make them transparent. For example, let’s say you have a class like this:

class Foo:
    def __init__(self, x=0, y=0):
        self.x=x
        self.y=y

The problem you may run into is that objects initialized from this class do not display any useful information when invoked directly or printed in the REPL:

>>> t = Foo()
>>> f
<__main__.Foo instance at 0x02C82940>

>>> print(f)
<__main__.Foo instance at 0x02C82940>

A lot of people know that you can simply override the __str__ method to get a nice string representation of your object that can be printed directly, or concatenated to some other text. It is however far less common to see the __repr__ method implemented in a class. It is a pity, because it controls what is displayed in REPL when the object is invoked directly and as such can be tailored to display developer/debug friendly information about the internal state. For example:

class Foo:
    def __init__(self, x=0, y=0):
        self.x=x
        self.y=y

    def __str__(self):
        return '({},{})'.format(self.x, self.y)

    def __repr__(self):
        return '{}(x={},y={})'.format(
            self.__class__.__name__,
            self.x,
            self.y)

Will behave like this in the REPL:

>>> foo = Foo(5,7)
>>> f
Foo(x=5,y=7)

>>> print(f)
(5,7)

Implementing the __repr__ method may seem unimportant and insignificant, but it saves lives.. Or at least man hours of debugging.

Lint Your Code

I’m a firm believer in linting your code. It is a good practice that tends to yield saner, more beautiful and readable code most of the time. A lot of people don’t like very aggressive linters because they tend to complain about trivial things such as tabs vs. spaces, lines longer than 80 characters and etc. You should keep in mind however that every dumb linter rule is there for a reason, and generally adjusting your coding style a bit to conform to said rules is never really a bad idea. Granted, if you are working with an existing code base that has never been linted, you might be shocked how much refactoring it would take to get it in shape. But when you start a new project, a rigorous linter is a great thing. It will not only help you structure the code, but it will also catch minor bugs that could otherwise be overlooked.

Personally I’m a fan of PyLint which tends to be super anal and very, very harsh on you. It actually scores your code, runs statistical analysis on it, and keeps track of last few runs so that you can see how much your changes have improved the overall quality of code.

Obviously PyLint is an external tool, but you can also get in-editor, as-you-type linting tools. Sublime has the excellent Sublime Linter which does more than just Python. If you run Vim there is a plethora of plugins that do just that, with the phenomenal Syntastic leading the pack. If you use Emacs then… Then ask Chriss – he will tell you what to use.

I highly recommend using PyLint in addition to any in-editor tool for the aggressive harshness and neat statistics it provides.

Always Be Testing

I don’t think I have to stress the importance of testing your code. At the very least you should be writing unit tests for all your classes. Python fortunately has a built in unit testing framework aptly named PyUnit. It is pretty much the industry standard and it follows the same conventions as JUnit and PHPUnit. If you have used these unit testing frameworks, PyUnit code should immediately look familiar and understandable:

import unittest
import Foo

class SimpleTestCase(unittest.TestCase):

    def setUp(self):
        """Call before every test case."""
        self.foo = Foo()
        self.file = open( "blah", "r" )

    def tearDown(self):
        """Call after every test case."""
        self.file.close()

    def testA(self):
        """Test case A. """
        assert foo.bar() == 543, "bar() not calculating values correctly"

    def testB(self):
        """Test case B"""
        assert foo+foo == 34, "can't add Foo instances"

If you want to do something along the lines of acceptance testing, then Splinter is probably the way to go. It has really concise and clean API and lets you create complex tests for web applications fairly easily. The only downside is that it is not really a full fledged test framework like Codeception for example. There is no way to assert or fail tests in splinter – it is meant to be used as an abstraction layer or a component. So you basically just write PyUnit test cases which use Splinter, which works quite well as you end up with all your tests running on the same framework in the same place.

Use a build tool

As you may or may not know, I am a big fan of Rake as a simple build tool. That said, it doesn’t really seem all that kosher (or practical) to use a Ruby build tool for Python projects. Fortunately there is Paver which is essentially Rake of Python world, though perhaps not nearly as popular.

I believe that originally it was created to streamline the creation of distribution bundles and packages, but you can confidently use it to run any number of arbitrary tasks. Like with Rake, your build file is an executable script – you don’t configure your build, you code it. This makes it endlessly extensible and perfectly suited for our puposes.

To start using Paver you just pip install paver and create a pavement.py file somewhere in your project directory. An example file could look something like this:

from paver.easy import *

@task
def lint():
    print "Linting..."
    # run linter 

@task
def test():
    print "Testing..."
    # run your unit tests

@task
@needs(['lint', 'test'])
def default():
    pass

You can run it like rake from the command line:

# run lint task
paver lint

# run the default task
paver

It is usually a good idea to automate running of your linter and your unit tests with Paver. It allows you to make some changes, then immediately have them analyzed for errors and bad code smells, and ran through all your test cases.

Suggestions?

I hope this was helpful to at least few people out there. I am by no means an authority on Python but I feel like some of these tips can be useful for both novices as well as seasoned programmers. I generally enjoy reading this sort of write-ups because even if I already most of the stuff in the article is always a chance that I will learn something new. What are your favorite Python tips? Have you ever stumbled upon a Python related article or a video which made you go “Wow, this changes everything!”. If so, please share it in the comments.

]]>

Student Webspace in the Cloud: Google App Engine

Mon, 28 Mar 2011 14:42:24 +0000

Do you ever feel that siren call of code that needs to be written? Sometimes I get an idea into my head, and then spend the next few days thinking about little else. I’m thinking about the code in the shower, on the toilet, in bed before sleep, while I sleep. Half the time it’s not even that interesting of a project… But it is a project, and I want to get it done. This is what happened to me this weekend.

My university used to allow students to create personal websites via Novell NetDrive service. It had a rather clunky, but perfectly usable web interface that allowed anyone to log in and manage files in their PUBLIC_HTML directory from anywhere in the universe (provided they can get internet connection). I used that service extensively for the HTML lab and final project assignments. But alas, the OIT decided to phase out all the Novell stuff and replace it with something much more difficult to use for the average student.

The new system requires you to mount the networked drive using WebDav, which is already a hurdle much to high for most of my students. But to add insult to the injury the difficulty is compounded by two additional issues:

For some strange reason you must change password of your campus wide ID for this new system to even bother talking to you
Computers in the lab are locked down so tightly that without an admin passwords students can’t mount shit

I figured that I might be able to skirt around these issues somehow, but despite my best attempts I haven’t been able to set the damned drive on my system for like 3 days. So having filed a tech support ticket into a black hole that is the OIT support system, I got an idea: I could write a bare bones NetDrive replacement over the weekend.

I’m not sure how I settled on using Google App Engine for this. I think I just didn’t want students to put their filthy files on the same server as my blog, I didn’t want to run a home server seeing how I don’t have a spare box, and I didn’t want to pay for the pleasure. So somewhere along the way I decided that App Engine is a great idea, even though it does not actually have a real file system. But, App Engine is free, and you can easily save files in it’s Blobstore.

So the idea is simple: the user comes along and uploads a file to blob store. We save his username, the file name and the Blobstore reference into the Datastore. Then we allow the user to retrieve his vile using a neat url that looks something like:

http://example.com/pub/username/foo.html

How do we do that? First let’s set up our handlers like this:

def main():
    
    application = webapp.WSGIApplication(
          [('/', MainHandler),
           ('/upload', UploadHandler),
	   ('/list', ListHandler),
           ('/pub/([^/]+)/([^/]+)?', ServeHandler),
          ], debug=True)
    run_wsgi_app(application)

So the upload form will sit at /, the upload action is gonna happen at /upload and we will be serving the files at /pub/username/filename.ext. So far so good.

This is how we are going to store our file information:

class MyBlobFile(db.Model):
    userName = db.StringProperty()
    fileName = db.StringProperty()
    blobstoreKey = blobstore.BlobReferenceProperty()

Apparently you have to use the BlobReferenceProperty to store a the unique blob id key in the dataStore. Initially I set this field as a StringProperty but App Engine was complaining like a little bitch. So I changed it.

Setting up a form is easy, but you need to remember the silly create_upload_url call. Make sure you include it. Otherwise it won’t work.

class MainHandler(webapp.RequestHandler):
    def get(self):
        # username = whatever, get from your session/login handler

        upload_url = blobstore.create_upload_url('/upload') # don't forget this

        self.response.out.write('' % upload_url)
	self.response.out.write('
' % username)
        self.response.out.write("""Upload File:  
		
""")

Here is the actual upload code:

class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
    def post(self):
        upload_files = self.get_uploads('file')  # 'file' is file upload field in the form
        blob_info = upload_files[0] # that's it, we're done

	# save in Datastore
	f = MyBlobFile()
	f.userName = self.request.get("username")
	f.fileName = str(blob_info.filename)
	f.blobstoreKey =  blob_info.key()
	f.put()

Finally, this is how we serve the file. It’s also very, very simple:

class ServeHandler(blobstore_handlers.BlobstoreDownloadHandler):
    def get(self, ffolder, ffile):

	# get the blob key from the blobstore
	q = db.GqlQuery("SELECT * FROM MyBlobFile WHERE netID =:1 AND fileName =:2", str(ffolder), str(ffile))
	results = q.get()
	
	resource = results.blobstoreKey
	self.send_blob(resource)

If you are perceptive, you probably noticed a flaw in my logic here. What if the user uploads two files with the same name? The Blobstore and Datastore won’t care. They will simply assign a new random key to the new entry and call it a day. This is indeed an issue, but I got around it by simply checking whether or not the file exists running the same query when the file it is uploaded. If there already is a Datastore entry that matches this username and filename then I delete it, and the associated blobstore entry. This mirrors what would normally happen in a filesystem – file would get overwritten.

This, ladies and gentlebirds is how you do it.

In retrospect, I probably did not need Blobstore for this issue. You see, Blobstore is a “billing only” feature of App Engine. I did not know that when I started this project but it bears mentioning: you will need to enable billing in order to use it. So if you can get away with it, it is probably a better idea to store your files in DataStore using BlobProperty. But it is nowhere near as nice – you essentially have to implement the blob_send() function yourself, send the correct mimetype headers and etc..

I’m actually considering rewriting my code to do it that way. For the time being, I enabled billing for my app, but set the daily budget to $0 which should keep me in the free quota range of 1GB of storage space. Since I’m only going to expose this app to 26 users I’m hoping this will be enough. It will be interesting to see if they will blow through the bandwidth and concurrent access quota during the lab session. They shouldn’t but then again, you never know.

Quick note: you can’t set your quota to $0 when you first enable billing. You have to set it to $1 first, and give them your credit card information (nothing is charged up front though). Then you wait 15-20 minutes till their system makes up it’s mind about the whole billing change, and go back in to change it down to $0. At this point it will accept 0 as a valid input value.

Thursday will be a genuine test by fire for my code. I will let you know the damage on Friday or next week.

In the meantime, if you want to mess around with the code, I have it up on GitHub.

Please note that I opted not to use the built in user/session handling mechanisms and instead bolted on custom session handling with GAE Sessions. If you wanted to use my code for your project, this might be something you would want to change. My justification for doing this is that Google account registration is a pain in the ass sometimes. I actually don’t know how many of my students have Gmail accounts and I don’t feel like walking 20 people through Google registration pages during the lab. So I went with something quick, easy and hackish.

And yes, I’m storing passwords as unsalted md5 hashes. That’s like 4 WTF’s all rolled up into one right there. Sue me. I just didn’t care enough to write something more robust. If Thursday is not a complete disaster, and I decide to continue using this tool, I will probably fix this.

Also, can someone explain to me why almost all non-programmers assume that it is a great idea to try to chit-chat with you when they see you have code on your computer screen. It’s like “Oh, I see you are busy writing code. Let me interrupt you by telling you about my day, and asking irrelevant questions about the weather forecast for Monday.” That shit is getting notorious lately.

]]>

Python: Tips and Tricks

Mon, 14 Jun 2010 14:35:55 +0000

As you may have noticed, I have been messing around with python quite a bit lately. I remember trying it out back in college and using it on few small projects and then abandoning it for a while. Then I started working with Google App engine (back when it was Python only) and I got sucked into it once again. Since then it has become one of my go-to languages. In fact I’m amazed how the language grew up and matured since I first heard about it. So I figured I might as well devote a post to neat little tricks and quirks of this language.

IDLE Shell:

If you are going to be using Python, you should definitely check out IDLE. This IDE is bundled with most Windows python binaries, and you can get it as a separate package under linux. The editor itself is very basic, and you are probably better off using Vim or Emacs instead. It’s killer feature however is the neat python shell:

The IDLE Shell

It does everything a regular interactive python shell would do, but it offers a number of improvements such as syntax highlighting and easier code editing. Basically when you start typing in a block of code such as a function or a conditional IDLE evaluates it only after you are done – not line by line like the regular shell. This allows you to use the up arrow and for example fix the previous line in the block before the whole thing is evaluated. Also since the shell runs inside a text-editor like environment, it makes it much easier to copy and paste lines of code between you text editor and your shell.

Whenever you are using IDLE keep these things in mind:

Alt+P – previous history command
Alt+N – next history command
Put cursor on any line + hit Enter – copies the line down

It’s a great little tool. Use it.

Enumerate

Python does not actually have a traditional for loop. It has a foreach loop, but that works because most of the built in data structures in the language can be iterated over. If you need a traditional indexed loop, you just fudge it by doing something like this:

for i in xrange(10):
    print i

If you want to iterate over a list, you just do it. But what if you want to iterate over a list, but also keep track of the index of each element? Well, you could do something like this:

>>> foo = ('a', 'b', 'c')
>>> for i in xrange(len(foo)):
	print i, foo[i]

0 a
1 b
2 c

This is however not very “pythonic” and can be done much cleaner using the enumerate function. Observe:

>>> for i,j in enumerate(foo):
	print i,j
	
0 a
1 b
2 c

Same result, cleaner code. This function is there because the scenario in which you iterate over a list, while maintaining an index number is incredibly common.

Any and All

Here is another very common scenario: check if any element of the list is true. Or, better yet, check if all of them are true. There are two functions out there that do just that. Any returns true if at least one element in the list is true. All returns if all are true.

>>> any(i>5 for i in xrange(10))
True

>>> all(i>5 for i in xrange(10))
False

This will usually save you a loop or two.

Putting if and else into lambda functions

Lambdas are great, but sadly are much more limited than closures. Essentially they need to be one-liner expressions that evaluate to something. You can’t for example stick a traditional if/else block inside of a lambda. It just does not work, syntax wise. But you can do something like this:

>>> bar = range(5)
>>> map(lambda f: True if f>3 else False, bar)
[False, False, False, False, True]

This is the alternate syntax of if/else block that turns it into one liner. You put the results on opposite ends, and stick the test inside if/else keywords as shown above. This syntax has many applications but I find it’s greatest impact is on lambda functions.

Default Values

Here is something you should watch for in your code. Default values are only evaluated once when the function is first created. Observe:

>>> def t(a, b=[]):
	b.append(a)
	print b

>>> t(1)
[1]
>>> t(2)
[1, 2]
>>> t(3)
[1, 2, 3]

The function above takes two arguments: a value and a list, and then appends that argument to that list. If no list is passed as an argument, it ought to use an empty one. Only that it does not. The optional blank list is actually initialized once when the function is called without the second argument for the first time. Subsequent calls will reuse that same list – which is probably not the intended effect of that code.

Decorators

A fairly recent addition to python are decorators – or as I call them wrap-around functions function. Let me briefly explain them for you. For example lets take these two functions:

>>> def foo(func):
	return lambda: func() + 1

>>> def bar():
	return 1

The first one takes a function func as an argument. It returns another function which will run func and add 1 to its result. The second one on the other hand always returns 1. What will happen when we do this:

>>> bar = foo(bar)

When you call bar, it will now return 2:

>>> bar()
2

Why? Because the function foo wraps around it and modifies the result every time. Python nowadays contains syntatic sugar that makes creating these wrap-around functions cleaner and easier. I can redefine our function bar like this:

>>> @foo
def bar():
	return 1

>>> bar()
2

Putting @foo above the function definition is equivalent to the bar=foo(bar) line. Decorators are a great way to attach additional functionality to certain functions or methods. For example, they are used internally to implement:

Static and Class Methods

Unlike many other languages Python does not have special scope keywords, so you can’t really declare a method as static. But you can fudge it using decorators. In fact they were introduced mostly for this very purpose. Observe:

class Foo:
   @classmethod
   def a_class_method(self):
      print "OH HAI THAR! I'm " + self

   @staticmethod
   def a_static_method():
      print "Some stuff"

Class methods know which class they belong to – they take the class reference as the first argument. Static methods do no such thing – they do not take an argument.

Another neat use for decorator are:

Properties

By default all the members of a class are public. You can create getters or setters for private variables, but they can be accessed directly at any time. You can sort of hide them if you precede their names with double underscore – but that only mangles their name, and does not really provide total encapsulation. If you want more control you can utilize the @property decorator. Essentially you just declare your getter and setter methods like this:

class Foo(object):
   def __init__(self):
      self.__x = None

   @property
   def x(self):
      return self.__x
	
   @x.setter
   def x(self, value):
      if value <= 10:
            self.__x = value
      else:
            self.__x = 10

From now on, anyone trying to access the variable x, will be forced to go through these methods. It works exactly like properties in C# for example:

>>> a = Foo()
>>> a.x = 15
>>> a.x
10

The example above does not show this, but the getter and setter methods can be as complex, and include as much code as you want to. So you can implement input validation or format the output in a certain way, that is independent of the internal data type of the member.

As usual, I'm putting these things here mostly for my own reference. This way I can find these things in a few months after I completely forgot about them. But I found that frequently other people find these posts just as useful. So there you go.

]]>

Python: Open the Most Recent Log File

Mon, 17 May 2010 14:11:52 +0000

Lately I have been on a Python kick. You know, just in case you haven’t noticed it based on the new outcrop of Python centric posts around these parts. In addition to Google App Engine related stuff I’ve been doing, the big P sort of became my new go-to language for random-ass scripting needs. Perl used to be that language to me, but… Well, let me tell you a story.

Recently I needed to update an ancient Perl script that I wrote few years ago. At the time I was actually taking a Bioinfomatics course, where we user Perl to process genetic sequences. Needless to say, the language and the regexp syntax were fresh in my mind. Fast forward few years of me working almost exclusively in Java and PHP and most of that 1337 knowledge got filed away in some distant corner of my brain. Then it was covered up by intellectual mold, cobwebs and someone hung a sign saying “Beware of the Leopard” on the door. Imagine me happily opening a file and going, “Hey, I just need to change couple of… Oh… Wait… What? What… I don’t even… What the fuck in hell was I thinking?”

It turns out that former me was a dick, and he was fond of playing Perl Golf. You know, as in “I bet I can do this in three fucking lines or less”. Yeah, I hate that guy. I eventually figured it out, but I decided that I should tell myself to:

Comment more
Fight the urge to write incomprehensible spaghetti code
Oh, and try not to hard-code so many things

So I figured out that I might as well take some of my old Perl scripts and rewrite them in Python where needed. For example, I had a script that I would periodically check a certain log file searching for error codes that I wanted to be notified about. Recently however the app that generates said log files has changed its behavior. Now, when a long file reaches a certain size it starts a new one giving it a name like “log_file_12345” where 12345 is some semi random garbage probably based on the timestamp or what not. My old Perl script of course had the log file hard coded.

I replaced it with a python script that did the same thing, and added the following logic to find the most recently modified file in the directory:

#ct: 801527ff-462f-4291-9925-8bcbe7ac9e0a
import os
def find_most_recent(directory, partial_file_name):

   # list all the files in the directory
    files = os.listdir(directory)

    # remove all file names that don't match partial_file_name string
    files = filter(lambda x: x.find(partial_file_name) > -1, files)

    # create a dict that contains list of files and their modification timestamps
    name_n_timestamp = dict([(x, os.stat(directory+x).st_mtime) for x in files])

    # return the file with the latest timestamp
    return max(name_n_timestamp, key=lambda k: name_n_timestamp.get(k))

You guys know python lambdas, right? They are basically inline functions. They take a list of arguments and a one-liner expression. Whatever that expression evaluates to is returned as a result. So as you can see above, I pass a lambda that will check whether a file name matches partial_file_name into the filter function.

The last line is interesting because it reveals a particular Python quirk. When you run the max function on a dictionary, it will return the the highest key without ever looking at values. If you want to get the key of the highest value however you have to do the above.

Have you ever been in this situation? Have you ever wanted to travel back in time and smack yourself upside the head for writing overly cryptic code. Or you know – stupid code? In production environment? Yeah, I have monumentally stupid code in production right now. How about you? The script above was an easy fix because it was running just for my benefit and nothing else depended on it. Some of the production stuff… Not that easy to fix.

]]>

Generating Random Pronoucable Passwords

Mon, 10 May 2010 14:07:08 +0000

Here is an interesting problem: how to generate sufficiently random but semi-pronounceable, and easy to remember passwords. I mean, putting together a random password generator is easy – just pick bunch of random character from a pool of printable symbols and you are done. The problem with this method is that more often than not you end up with something like Bz9hFtT[mp1 which while reasonably strong, is virtually impossible to remember. I don’t know about you, but my brain is just not very good at holding on to random sequences of characters and/or numbers.

To me best passwords are ones that are not dictionary words, but retain word-like qualities. You can sort of sound them out in your head. How do you accomplish that though? The simplest way is to alternate vowels and consonants. The algorithm goes something like this:

Pick a random character
If the last character was a consonant, next pick a vowel
If the last character was a vowel, pick a consonant/number/sybmol
Consonants should common, numbers/symbols should be rare

You could improve this a bit by allowing strings of 2-3 consonants to happen every once in a while. If you really wanted to be fancy you could attach weight to each character based on the statistical frequency in which it tends to appear in natural language. But the above works surprisingly well – especially if you choose to break up your passwords into 4 or 6 character syllables. Here are some sample passwords I generated using the above method:

tExU-1EXi
JAdy-9INI-1u
vacy-6Ate
WiRU-xOZi
RoFu-RYKy
TANA-He8O
Y1U-MA5E-fU

Pretty decent for something this simple, eh? I thin that RoFu-RYKy and WiRU-xOZi are my favorites from this list.

Here is a sample python code I used to accomplish this magic:

# ct: dd66d04b-ce35-420a-9b64-63817bd43fa9

def next_char(last_char, use_symbols=False):
   """ Return a randomly generated character """

   vowels = ['a', 'e', 'i', 'o', 'u', 'y', 'A', 'E', 'I', 'O', 'U', 'Y']
   consonants = [i for i in string.letters if i not in vowels]

   # Using a reduced set of symbols for clarity
   symbols = ["@", "#", "$", "%", "&"]

   if not last_char:
      return random.choice(string.letters)

   if last_char in consonants or last_char in string.digits or last_char in symbols:
      return random.choice(vowels)

   if last_char in vowels:
      pct = random.randint(0, 100)

   if pct < 60:
      return random.choice(consonants)
   elif pct < 90:
      return random.choice(string.digits)
   elif use_symbols:
      return random.choice(symbols)
   else:
      return random.choice(consonants)


def gen_password(length = 8):
   """ Generate random password given length """
   
   passwd = ""
   last = "" 

   for i in range(length):

      last = next_char(last)

      passwd += last;

      if i%4==3 and i!=length-1:
         passwd += "-"
  
   return passwd

In case you were wondering, I have a working demo here. The code could probably be improved, but feel free to use it if you want to. See if you can retain the GUID number up top. I decided to follow Jeff Atwood's advice and start tagging the code snippets with a unique ID's. This way if they for some strange reason become part of your code base you will know where they came from and you can always google that ID to check for updates. This is probably not that important in a trivial piece of code like this one, but I'm doing it to get in the habit.

]]>

Character mapping must return integer, None or unicode

Thu, 06 May 2010 14:27:19 +0000

The other day I implemented a simple rot13 function in python like this:

from string import ascii_uppercase as upper, ascii_lowercase as lower, maketrans

def rot13(text):
    rot13_alphabet = upper[13:]+uc[:13] + lower[13:]+lc[:13]
    rot13_transform = maketrans(upper+lower, rot13_alphabet)
    return text.translate(rot13_transform)

Yes, I know there is a one liner function for that. But I was just messing around doing it the hard way to remember my way around python. This seemed like a nice alternative to the more direct method in which you iterate over the whole message by character, and then increase it’s ASCII code by 13. If you can’t tell how it works, let me explain – maketrans returns a string transform object. It that takes two parameters: the before and after list of symbols. When you use this object as a parameter in the translate function it does the following: whenever it sees a symbol from the before list in your message, it replaces it with a corresponding symbol from the after list. So I created a rotated alphabet version and used it as my after parameter.

This thing ran fine in tests – when I passed in a string manually it worked flawlessly. Then I plugged it into a bigger piece of code that involved web forms and Django templates and it blew up with the following message:

TypeError: character mapping must return integer, None or unicode

Not very descriptive if you ask me. It turns out that for some reason the web form was returning a unicode string, but my translate function only works with ASCII. That’s what this error is all about.

My lazy half-assed solution?

text = text.encode('ascii')

Or you know, I could do it the old fashioned way – one character at a time and what not. Actually, it’s a like – there is another solution for using translate function. Essentially you need to do what maketrans function does but manually like this:

rot13_alphabet = unicode(uc[13:]+uc[:13] + lc[13:]+lc[:13])
alphabet = unicode(uc+lc)
rot13_transform = dict(zip(map(ord, alphabet), rot13_alphabet))

Yes it is a bit convoluted so let me explain:

Ord is a function which returns the unicode point code for the character you pass in:
```
>>> ord('a')
97
```
Map takes a function and a list, and then applies said function to every member of the list.
```
>>> map(ord, 'abc')
[97, 98, 99]
```

Zip takes two lists and returns a list of touples that contain pairs of elements from each list.

>>> zip('abc', 'nop')
[('a', 'n'), ('b', 'o'), ('c', 'p')]

>>> zip(map(ord, u'abc'), u'nop')
[(97, u'n'), (98, u'o'), (99, u'p')]

Finally, dict creates a dictionary out of a list of touples just like the one created by zip.

>>> dict([('a', 'n'), ('b', 'o')])
{'a': 'n', 'b': 'o'}
>>> dict(zip(map(ord, u'abc'), u'nop'))
{97: u'n', 98: u'o', 99: u'p'}

You can apparently use this format in the translate function instead of the object generated by maketrans. This way you can fully support unicode translations but… Well, it won’t really do anything for localized characters such as ąężćó and etc. because they are not part of my rotation table anyway. If you wanted to rot13 Polish or German for example you would have to add the extra character at correct positions. Or use string.letters (which becomes localized when a locale is applied), split it in half and then rot-13 each part.

And of course, this whole discussion is moot because the way you should do rot13 is this:

# in python < 3.x
'text'.encode('rot13')

# in python >= 3.x
import codecs
codecs.encode('text', 'rot13')

On the upside, you have just learned about maketrans, dict, map, zip and ord functions and the pitfalls of character mapping and unicode.

]]>

Exchanging Files Over the Network the Easy Way

Thu, 08 Apr 2010 14:23:57 +0000

The age old problem: how do I send you these files with the least amount of effort. Have you ever been is this situation? You need to send bunch of files to another person sitting 3 feet away from you. Neither one of you is carrying any portable media. No thrumb drives, not blank CD’s or anything like that. What you do have is network connectivity.

The first thing people do in a situation like this is to reach for their email. Unfortunately email was never designed to be a file sharing medium. If you have a large number of big files to send over, email will not work. You could use one of the online file sharing services such as SendSpace or Rapidshare but they too have arbitrary file size limitations. Not only that, but your data will now be stored somewhere on the interwebs – in many cases this is unacceptable.

What do you do then?

A while ago I wrote about one possible solution to this problem – a mini HTTP server. That was a windows centric solution that used a proprietary freeware tool. Recently I discovered a much nicer solution that requires nothing but a standard Python installation.

Here is what you do. On the command line, navigate to the folder you want to share over the network, and issue the following command:

python -m SimpleHTTPServer 80

Boom! Instant HTTP server. All you need to do now is to instruct the other person to open a web browser and type in your current IP into the address bar. Easy, painless and awesome. Here is me running the server and accessing it at the same time:

SimpleHTTPServer in Action

As you can see, the server will actually show you information about every access attempt so you can have some evidence that no one but the authorized person accessed it while it was running.

Still, sharing the whole folder is not always desirable. Sometimes you want to transfer just a single big file. If you recall, I already shown you how to do this with Netcat. I’m going to continue with the Python centric theme and introduce you to a neat little script called woof.

Using it is very simple. Just do this:

python woof.py MyFileToBeShared.zip

That’s it. It creates a server on port 8080 and serves that particular file and nothing else. One disadvantage of this script seems to be that it seems to rely on something POSIX-y to work. In other words, it fails miserably in Windows environment but it should work in a real operating system.

Of course, sometimes you are in a reverse situation. Instead of having to send files, you need to receive them. Running a miniature HTTP server won’t help you. Or maybe it will. What you need is Droopy – another Python mini server. Instead of serving files however, it creates a file upload page that looks like this:

Droopy in action.

You can customize how this page looks by displaying a custom message and ever a picture below the upload form. But this is how it works. It is quite brilliant actually. Run it, tell the other person to navigate to your IP on port 8000 and upload the file. Very simple and very effective.

Keep these things in mind during your travels. I’d recommend installing Python to be the first thing you do whenever you get a new laptop.

What are your favorite python tricks? Or network file transfer tricks in general?

]]>

Google AppEngine: URL Rewriting

Mon, 11 May 2009 14:32:47 +0000

Back in February I showed you how to create a nice looking, dynamic home page on Google AppEngine. This implementation had major flaw – namely ugly URL’s. I have set it up so that the name of the page to be loaded was passed in as a variable in a GET request. So the URL’s looked a bit like this:

http://yourdomain.com/p=pagename

This is obviously less than desirable and looks a bit unprofessional. My whole idea was to make a professional, spiffy looking website that you could put on a business card or give out to prospective employers. The page was supposed to underline your competence, and professionalism. Ideally, you would want to have nice looking and easy to remember URL’s like this:

http://yourdomain.com/pagename

No question marks, equal signs, dots or file extensions. Just the domain name, and a keyword. In most cases you would achieve this effect by using URL Rewrite feature of Apache. All it takes to set something like this up is 1 or two lines that need to be added to your .htaccess file.

Unfortunately, Google AppEngine does not use Apache and uploading .htaccess file won’t do you any good. To achieve the effect I described above you need to hack the code in your controller class.

Here is the modified version of the main.py file from the previous article:

import os
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
from google.appengine.ext.webapp import template

class MainPage(webapp.RequestHandler):
   def get(self, p):
      if p:
         page = p + ".html"
      else:
         p = "main"
         page = p + ".html"

      if not os.path.exists(page):
         page = "404.html"

      template_values = {
            "page" : page,
            p: "first", 
      };

      path = os.path.join(os.path.dirname(__file__), 'index.html')
      self.response.out.write(template.render(path, template_values))

application = webapp.WSGIApplication([(r'/(.*)', MainPage)],debug=True)

def main():
   run_wsgi_app(application)

if __name__ == "__main__":
   main()

I will save you doing a mental diff between the two files, and simply tell you what changed. You simply need to modify two lines. First, you need to change the WSGIApplication invocation to include this pattern matching code:

application = webapp.WSGIApplication([(r'/(.*)', MainPage)],debug=True)

Next, you need to change the method definition in your handler to include a variable like this:

def get(self, p):

You will also need to get rid of the line which checks the p variable in the GET request. After you add this code, everything found after the / in your URL will be passed to your handler as the value of the argument you just defined.

It is very simple, very elegant and allows you to have very clean looking and easy to remember urls.

]]>

Running a Blog on Google Appengine

Thu, 05 Mar 2009 15:28:54 +0000

Last week, I talked about setting up a dynamic personal web page on Google AppEngine. I mentioned that it would be possible to grow and expand your page beyond the simplistic informational page I have shown you. More specifically, I mentioned running blogs and forums out of it. I already described how to set up a AppEngine based forum so there is no need to reiterate that here. Today I wanted to talk about running your own blog on top of the service.

No, I don’t expect you to write your own blog software, just like you didn’t have to write the forum software. Believe it or not, but blogging engines are being ported to or designed this platform. One of such engines is Bloog written and maintained by Bill Katz. As far as I can tell, it is one of the first Blog software that runs on AppEngine but there will be more to come – I’m sure of it.

Bloog may not be the most sophisticated blogging platform, or the most configurable one – it’s not WordPress. But it is fully functional and it has everything you may need for day to day blogging. It supports reader comments, tagging, has a dynamic contact form, dynamic archives by year, it automatically generates an Atom feed and the template is pre-configured to allow you including vertical add banners in the sidebar. No bells and whistles mind you, but solid functionality.

Bloog is on GitHub and you probably are best off pulling the source code from there. If you are a bit old fashioned like me, and you haven’t caught the git bug yet, the GitHub has a wonderful feature that will allow you to download all of the source code from the repository with a single mouse click as either a zip file or a tarball. Unfortunately if you do that, you will get an incomplete version of the code.

I haven’t really used git, so maybe someone with more experience can verify if what I’m saying is correct. From what I’ve seen, if you are using git, you can set up something akin to symlinks that point to other projects on GitHub. When you check the code out, git will notice the dependency, and follow the link to download the latest and greatest code for it from it’s own repository and put it in the correct directory in your project. This is very neat, but unfortunately it breaks down when you use the “download code as a tarball” feature. The “foreign” code is not fetched, so you end up with empty directories where it was supposed to be.

Bloog depends on Firepython which should be in the utils/external/firepython/ directory. But it’s not and you will probably spend few minutes scratching your head to figure it out. Fortunately I stumbled upon this discussion to help me out.

Essentially, what you have to do, is to grab Firepython a tarball, and extract it into utils/external/firepython/ in your Bloog directory. After you do that, you are ready to go. Here is how my Bloog turned out:

tux-mentat.appspot.com

There administration UI is vestigial, so to change things such as the blog title, author and the sidebar links you need to manually edit _config.py in the root directory of your project. It’s actually very straightforward – just a long associative list. There is nothing to it, and even someone with no knowledge of Python should be able to do it.

The sidebar banner (I changed it to that liquid drip thing just to show you it’s possible) is defined in a static HTML file located in \views\default\bloog\ads.html. Just delete whatever was there and insert your own personalized vertical banner or add block.

Once you do that, you can just deploy it to Google AppEngine and enjoy your own little blog. That’s really all there is to it. They could make it easier to install, but really the Firepython thing is my fault. If I used git, it would probably never happen.

There you go – yet another awesome thing you can do with Google Apps. You can blog on it, run discussion forums, create personal pages – possibilities are endless. When it first came out, a lot of people discarded it as a mere toy but it’s not. It is a glimpse of what cloud computing can do for you these days! It is a glimpse of how a lot of hosting might be done in the future.

Of course, standard disclaimers apply – you are hosting your shit on Google’s cloud for free which means you don’t really own it. And since it’s google, it will probably harvest your blog for data which it will use maliciously in it’s quest o achieve sentience. So you have to factor that into the equation when you are deciding whether or not you want to use this service.

Arguably, in this day and age there are dozens of free blogging platforms out there. Most of them will let you use your own domain name too. So running a blog on AppEngine is probably not the most efficient way to go about blogging. Bu it is kindoff awesome – and you can hack python code and you are not afraid to get your hands dirty you do get much more control over how your blog looks and what it does than with one of these free services.

I wouldn’t probably recommend running your “mission critical” blog on this platform. But for AppEngine enthusiasts like me, Bloog is yet another fun app to mess around with.

]]>

Python, Telnet and GUI-fying Legacy Apps

Wed, 12 Nov 2008 16:31:24 +0000

Today I want to talk about telnet. Yes, telnet is stupid and you should use ssh. Sometimes you can’t though. For example if there is some old legacy app that is sitting on some remote server that is not even yours and the only way to interact with it is via telnet session. We have one of these things at work – it is a little app running on a mail server that allows people to set up an auto-reply message when they go on vacation.

In theory this is something that every user should be able to do themselves. The instructions are simple enough. Go to Start, Run, paste this telnet command into the box, hit enter, follow prompts on the screen. The interface is easy to use, if a little clunky. I never had any issues with it. And yet, no one ever wants to deal with. That black window with the dreaded blinking cursor is incredibly scary it seems. Most people call the help desk and request a message to be set up for them instead. This is not the end of the world, but it’s backwards. This system does not have a main administration panel where we could set these things up from above. So when someone calls in with a request, the helpdesk must actually telnet to the system and use their username and password.

So I started thinking if we could dumb-down this process enough to make it accessible to our average luser. The idea was to have a GUI layer shielding the user from the scary telnet stuff. I hacked up a tiny little prototype app in Python and it looked a bit like this:

It is very simplistic – kinda ugly actually. It’s Python using the Tkinter widgets for GUI-fication. It is less than 30 lines of code once again demonstrating that Python can be very terse and concise language despite it’s strict whitespace regime – even when coding GUI apps. Observe:

from Tkinter import *
class AutoReply:
   def __init__(self):
      self.root = Tk()
      self.root.title("DG Auto Reply Tool")
      self.root.resizable(0,0)

      self.text = ""
      self.email = StringVar()
      self.onoff = StringVar()

      self.reply = Text(self.root)
      self.reply.grid(row=0, column=0, columnspan=5)

      eml = Entry(self.root, textvariable=self.email)
      eml.grid(row=1, column=0)

      status = Label(self.root, textvariable=self.onoff)
      status.grid(row=1, column=3)

      button = Button(self.root, text="Update", command=self.update)
      button.grid(row=1, column=1)

      toggle = Button(self.root, text="Auto Reply:", command=self.toggle)
      toggle.grid(row=1, column=2)

      self.root.mainloop()

   def update(self):
      pass

   def toggle(self):
      pass

if __name__ == "__main__":
   app = AutoReply()

This is what Python is good at – creating usable software really fast. Next step was to figure out how this GUI layer would communicate with the telnet application. I started thinking about sockets and streams and all that fun stuff. That was my first instinct – just open a socket and dump messages to it and try to see if I can get anything back. I did something like this before, and it was a lot of tedious coding just to get the communication between the client and server. Of course I was using Java to do it, which means there was a lot of tedious coding there in general.

Python on the other hand ships with a built in telnet library (aptly named telnetlib) and all you really need to do to connect to a server and start sending messages is this:

I didn’t want to do all of that.

import telnetlib

HOST = "my.remote.host"
PORT = 1337
user = "my-username"
password = "my-password"

tn = telnetlib.Telnet(HOST, PORT)

tn.read_until("login: ")
tn.write(user + "\n")
tn.read_until("Password: ")
tn.write(password + "\n")

You just follow this pattern for all your other interactions with the server. You read_until and then you write and etc. I abstracted this into a tiny self contained class, and made my GUI initialize it in the constructor and then call it every time it needed to send or receive data. I kept it simple – each transaction was self contained, connecting to the server, logging in, interfacing with the app and then disconnecting. This way I didn’t have to worry about keeping the connection alive, timeouts, disconnects and could pretty much rely that the app running on the server will be in predictable state when I connect to it.

The rest was just a bit of creative screen scraping. The read_until method returns a string, which I could break into lines, and then parse each line to extract the information I needed. It was pretty easy to isolate the important data and discard everything else. I’m not going to post the code here because it is really customized to the particular application, and thus probably useless to most people.

The GUI works pretty well, but it is synchronous – it will block until the read/write transactions are accomplished which is probably not the worst thing that could happen. I’m pretty sure I could do it asynchronously but it would take more effort and make no huge impact on usability. The blocking as it is right now actually works like a feedback mechanism that shows the user when sending and receiving is taking place.

My app is actually more user friendly because it allows you to edit the message as a whole in a familiar editor window. The telnet app forces you to type the message line by line. The only way to change a line already in place is to delete it, append new line to the end of the message and then move it up one line at a time until it is in the right place. Oh, and pressing backspace inserts the lovely ^H instead of actually deleting the character. So the GUI is much friendlier and I deal with the craziness by simply nuking the message and then re-inserting it every time the user clicks on update.

I haven’t unleashed it on unsuspecting users yet – it’s still buggy and unfinished. I’m just throwing it out here to show how easy it is to hack things like that together with just a little bit of Python. The complete app is tiny (only slightly over 100 lines of code – including white space). The only problem is that most of my users are not running Python. This of course could be solved by using py2exe but then I’m basically converting 3KB of python into 30 MB of crap. Not a perfect solution there. I could either deploy python, ship the app as a huge native package, or rewrite it in C#. What would you do?

]]>