Python: Increase Your Zen, Maximize Your Hapiness

The philosophy of Python can be summed up in a single line:

python -m this

When I first discovered Python it still had that ugly, pixelated green snake logo all over their website, and the documentation was all like “Monty Python guise, LOL, amirite?”. Over the years it has grown up a lot and became so mature, solid and reliable that people no longer even care about the white space thing as much. For years now Python has been ordinary and almost boring – all the rock stars, novelty chasers and cool kids left it for Ruby long time ago (and then recently left Ruby for things like Haskell and Node) and so serious shit can get done.

But despite being stable and mature environment, Python code is still a joy to work with. The beauty of the language is it’s enduring quality, baked right into the syntax and grammar of the language, and deeply rooted in the community. But as with all languages that have been around for a while, Python is not impervious to cruft, baggage and just plain poor coding. Despite the design of the language it is actually not that difficult to write ugly Python code. It is also not that hard to write pretty good code but in a way that is less than optimal.

So I wanted to share some tips, tricks and best practices for writing Python code that I have picked up from various sources over the years. This is definitely not an exhaustive or all-inclusive list. Rather it is a random assortment of suggestions and ideas that can help you structure your code better, and tools that will help you to do more with less.

Use Virtualenv

Installing packages globally is not always good idea. Especially if you use easy_install which can do funky things such as partial installs caused by network connectivity issues. It is better to keep your global install relatively clean and only install random packages on a project per project basis. Then, even if you mess up your installation you can just wipe the slate clean and start over again without much hassle.

Python doesn’t have native support of installing packages and modules this way, but you can force it to do so using the virtualenv package. The idea behind it is rather simple: you run a command line script and it copies your python environment into the current working directory, and then modifies your path for the current session so that all calls to python search the local folders first.

Here is an example how you would create a virtual environment for your project:

# Copy a clean install of python with no packages to ./Foo
# It will create ./Foo if it doesn't exist
virtualenv --no-site-packages Foo
cd Foo
 
# Update path for current session
source bin/activate

This gives you a clean slate instance of Python with no baggage, which you can now build up with only the packages that are required for your project. Granted, this method litters your project folder with bunch of standard library files which can be annoying.

I usually recommend using virtualenvwrapper alongside with the virtualenv package which provides a set of wrapper scripts that aim to simplify your life. Once you install it, you can define $WORKON_HOME environment variable (usually ~/.virtualenvs) which is going to be the default directory for installing virtual environments. It also simplifies the syntax a little bit, and modifies your prompt to reflect the virtualenv you are working in:

# Create virtualenv in $WORKON_HOME/Foo
mkvirtualenv Foo
 
# Activate virtualenv
workon Foo

The great thing about this approach is that you can have two separate environments (say ENV1 and ENV2) for the same project. For example if you want to investigate a bug that only happens when your code is running on a legacy version of Django, you can easily set it up.

Use pip instead of easy_install

One could argue this is a matter of personal preference, but I personally think that using pip should be considered one of the fundamental best practices for Python. As package managers go, it is simply superior in almost every conceivable way. It has meaningful error messages, it doesn’t leave your system in a weird state if an install fails, it can talk to source control repositories and it provides an uninstall command which lets you easily remove old or deprecated packages you no longer use.

It also allows you to create and use simple text files to track dependencies for your project. You simply create a file (let’s call it requirements.txt like this, simply listing names and versions of packages that need to be installed on the system in order for your code to work:

MyProject
Markdown==2.3.1

Save it with your code, commit it to the repository and forget about it. Later, when you need to deploy it somewhere else you can simply tell pip to fetch all the dependencies like this:

pip install -r requirements.txt

If you are using virtualenv, you can actually easily test that this works by creating an environment with the –no-site-packages switch. Or, if you already have a virtualenv custom-tailored to your project you can run the following command to auto-generate requirements file for yourself:

pip freeze > requirements.txt

Note that this typically dumps all the installed packages into the file, so if you run it globally instead of in a virtualenv you can end up with a really long list of packages that may or may not be related to your project.

Standard Libraries are not always best

Python code is beautiful, except when it isn’t. Most of the standard libraries follow the simple philosophy of Python Zen: they facilitate writing beautiful, simple, flat and sparse code. That said, the standard library has a few ugly warts: like urllib2 for example.

Take the following code snippet for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import urllib2
 
url = 'https://api.github.com'
 
req = urllib2.Request(url)
 
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, url, 'user', 'pass')
 
auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)
 
urllib2.install_opener(opener)
handler = urllib2.urlopen(req)
 
print handler.getcode()
print handler.headers.getheader('content-type')

Is this beautiful? Is it clean? To me, the entire library has a very heavy handed Java feel to it. Compare it to the same functionality implemented using the third party requests library by Kenneth Reitz:

1
2
3
4
5
6
import requests
 
r = requests.get('https://api.github.com', auth=('user', 'pass'))
 
print r.status_code
print r.headers['content-type']

I think that most of us would agree that the latter is cleaner, clearer, more readable and arguably more Pythonic than the former. While it may seem counter-intuitive at first, sometimes picking a few good external packages can make a difference between unattainable mess and a clean and beautiful code.

If at any point you find yourself writing Python code that seems ugly, dirty and cumbersome, chances are there is a better way. You just have to look around a bit.

Use Context Managers

When you are writing code, chances are that sooner or later you will run into a situation where you need to open, lock or reserve some resource and then release it afterwards. The most trivial example would probably be writing to a file:

1
2
3
f = open('myfile','w')
f.write('Hello world!\n')
f.close

The unfortunate truth about code like that is that most of us habitually forget about like #3. And even if we don’t forget it, an exception or an error might prevent it from being ran. This may not happen so often when doing simple I/O but the same pattern is used to connect to databases or manipulating sockets where errors and exceptions are to be expected as a fact of life, and closing and releasing resources is doubly important. You could wrap your code in a try-finally blocks, but as of Python 2.5 and higher the cleaner and simpler way of writing the above is:

1
2
with open('myfile','w') as f:
	f.write('Hello world!\n')

The cleanup is implicit, automatic and triggered whenever you leave the context of the block. This works great with objects, classes and functions that support it, but how do you bake it into your own custom code? It’s relatively simple: you put your set-up code in an __enter__ method and your tear-down code in an __exit__ method and they will be auto-magically called whenever your class is initialized using the with context manager:

1
2
3
4
5
6
7
8
9
10
11
12
# set it up
class DBConnect:
    def __enter__(self):
        # establish db connection
        return self.dbconnection
 
    def __exit__(self):
        self.dbconnection.close()
 
# use it like this
with DBConnect() as db:
    # do stuff

Defining these methods for your custom classes not only makes the code more robust, but also more beautiful, allowing you to avoid writing the unseemly try-finally blocks all over the place.

Create Objects that are Transparent

The Python REPL can be extremely useful tool. I use it all the time to quickly test ideas or poke my code around. Unfortunately it is not always easy to glimpse inside your custom classes and objects this way unless you take steps to make them transparent. For example, let’s say you have a class like this:

1
2
3
4
class Foo:
    def __init__(self, x=0, y=0):
        self.x=x
        self.y=y

The problem you may run into is that objects initialized from this class do not display any useful information when invoked directly or printed in the REPL:

>>> t = Foo()
>>> f
<__main__.Foo instance at 0x02C82940>
 
>>> print(f)
<__main__.Foo instance at 0x02C82940>

A lot of people know that you can simply override the __str__ method to get a nice string representation of your object that can be printed directly, or concatenated to some other text. It is however far less common to see the __repr__ method implemented in a class. It is a pity, because it controls what is displayed in REPL when the object is invoked directly and as such can be tailored to display developer/debug friendly information about the internal state. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
class Foo:
    def __init__(self, x=0, y=0):
        self.x=x
        self.y=y
 
    def __str__(self):
        return '({},{})'.format(self.x, self.y)
 
    def __repr__(self):
        return '{}(x={},y={})'.format(
            self.__class__.__name__,
            self.x,
            self.y)

Will behave like this in the REPL:

>>> foo = Foo(5,7)
>>> f
Foo(x=5,y=7)
 
>>> print(f)
(5,7)

Implementing the __repr__ method may seem unimportant and insignificant, but it saves lives.. Or at least man hours of debugging.

Lint Your Code

I’m a firm believer in linting your code. It is a good practice that tends to yield saner, more beautiful and readable code most of the time. A lot of people don’t like very aggressive linters because they tend to complain about trivial things such as tabs vs. spaces, lines longer than 80 characters and etc. You should keep in mind however that every dumb linter rule is there for a reason, and generally adjusting your coding style a bit to conform to said rules is never really a bad idea. Granted, if you are working with an existing code base that has never been linted, you might be shocked how much refactoring it would take to get it in shape. But when you start a new project, a rigorous linter is a great thing. It will not only help you structure the code, but it will also catch minor bugs that could otherwise be overlooked.

Personally I’m a fan of PyLint which tends to be super anal and very, very harsh on you. It actually scores your code, runs statistical analysis on it, and keeps track of last few runs so that you can see how much your changes have improved the overall quality of code.

Obviously PyLint is an external tool, but you can also get in-editor, as-you-type linting tools. Sublime has the excellent Sublime Linter which does more than just Python. If you run Vim there is a plethora of plugins that do just that, with the phenomenal Syntastic leading the pack. If you use Emacs then… Then ask Chriss – he will tell you what to use.

I highly recommend using PyLint in addition to any in-editor tool for the aggressive harshness and neat statistics it provides.

Always Be Testing

I don’t think I have to stress the importance of testing your code. At the very least you should be writing unit tests for all your classes. Python fortunately has a built in unit testing framework aptly named PyUnit. It is pretty much the industry standard and it follows the same conventions as JUnit and PHPUnit. If you have used these unit testing frameworks, PyUnit code should immediately look familiar and understandable:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import unittest
import Foo
 
class SimpleTestCase(unittest.TestCase):
 
    def setUp(self):
        """Call before every test case."""
        self.foo = Foo()
        self.file = open( "blah", "r" )
 
    def tearDown(self):
        """Call after every test case."""
        self.file.close()
 
    def testA(self):
        """Test case A. """
        assert foo.bar() == 543, "bar() not calculating values correctly"
 
    def testB(self):
        """Test case B"""
        assert foo+foo == 34, "can't add Foo instances"

If you want to do something along the lines of acceptance testing, then Splinter is probably the way to go. It has really concise and clean API and lets you create complex tests for web applications fairly easily. The only downside is that it is not really a full fledged test framework like Codeception for example. There is no way to assert or fail tests in splinter – it is meant to be used as an abstraction layer or a component. So you basically just write PyUnit test cases which use Splinter, which works quite well as you end up with all your tests running on the same framework in the same place.

Use a build tool

As you may or may not know, I am a big fan of Rake as a simple build tool. That said, it doesn’t really seem all that kosher (or practical) to use a Ruby build tool for Python projects. Fortunately there is Paver which is essentially Rake of Python world, though perhaps not nearly as popular.

I believe that originally it was created to streamline the creation of distribution bundles and packages, but you can confidently use it to run any number of arbitrary tasks. Like with Rake, your build file is an executable script – you don’t configure your build, you code it. This makes it endlessly extensible and perfectly suited for our puposes.

To start using Paver you just pip install paver and create a pavement.py file somewhere in your project directory. An example file could look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from paver.easy import *
 
@task
def lint():
    print "Linting..."
    # run linter 
 
@task
def test():
    print "Testing..."
    # run your unit tests
 
@task
@needs(['lint', 'test'])
def default():
    pass

You can run it like rake from the command line:

# run lint task
paver lint
 
# run the default task
paver

It is usually a good idea to automate running of your linter and your unit tests with Paver. It allows you to make some changes, then immediately have them analyzed for errors and bad code smells, and ran through all your test cases.

Suggestions?

I hope this was helpful to at least few people out there. I am by no means an authority on Python but I feel like some of these tips can be useful for both novices as well as seasoned programmers. I generally enjoy reading this sort of write-ups because even if I already most of the stuff in the article is always a chance that I will learn something new. What are your favorite Python tips? Have you ever stumbled upon a Python related article or a video which made you go “Wow, this changes everything!”. If so, please share it in the comments.

This entry was posted in programming and tagged . Bookmark the permalink.



2 Responses to Python: Increase Your Zen, Maximize Your Hapiness

  1. Warning: even though I was referenced as an authority on Emacs support for Python, I actually have little knowledge on this particular aspect of Emacs! Python is a language I haven’t studied at all so far. In fact, this article is now probably responsible for half my current Python knowledge.

    Reply  |  Quote
  2. I really, really love fabric for making recipies of stuff that I (typically not so frequently) do more than once.

    It seems like it’s in a similar arena to paver, but can be used to run arbitrary commands on remote machines as well.

    I agree heartily with every section in this article. We use flake8 at work, but I will give pylint a try.

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>