Character mapping must return integer, None or unicode

The other day I implemented a simple rot13 function in python like this:

from string import ascii_uppercase as upper, ascii_lowercase as lower, maketrans
 
def rot13(text):
    rot13_alphabet = upper[13:]+uc[:13] + lower[13:]+lc[:13]
    rot13_transform = maketrans(upper+lower, rot13_alphabet)
    return text.translate(rot13_transform)

Yes, I know there is a one liner function for that. But I was just messing around doing it the hard way to remember my way around python. This seemed like a nice alternative to the more direct method in which you iterate over the whole message by character, and then increase it’s ASCII code by 13. If you can’t tell how it works, let me explain – maketrans returns a string transform object. It that takes two parameters: the before and after list of symbols. When you use this object as a parameter in the translate function it does the following: whenever it sees a symbol from the before list in your message, it replaces it with a corresponding symbol from the after list. So I created a rotated alphabet version and used it as my after parameter.

This thing ran fine in tests – when I passed in a string manually it worked flawlessly. Then I plugged it into a bigger piece of code that involved web forms and Django templates and it blew up with the following message:

TypeError: character mapping must return integer, None or unicode

Not very descriptive if you ask me. It turns out that for some reason the web form was returning a unicode string, but my translate function only works with ASCII. That’s what this error is all about.

My lazy half-assed solution?

text = text.encode('ascii')

Or you know, I could do it the old fashioned way – one character at a time and what not. Actually, it’s a like – there is another solution for using translate function. Essentially you need to do what maketrans function does but manually like this:

rot13_alphabet = unicode(uc[13:]+uc[:13] + lc[13:]+lc[:13])
alphabet = unicode(uc+lc)
rot13_transform = dict(zip(map(ord, alphabet), rot13_alphabet))

Yes it is a bit convoluted so let me explain:

  1. Ord is a function which returns the unicode point code for the character you pass in:
    >>> ord('a')
    97
  2. Map takes a function and a list, and then applies said function to every member of the list.
    >>> map(ord, 'abc')
    [97, 98, 99]
  3. Zip takes two lists and returns a list of touples that contain pairs of elements from each list.
    >>> zip('abc', 'nop')
    [('a', 'n'), ('b', 'o'), ('c', 'p')]
     
    >>> zip(map(ord, u'abc'), u'nop')
    [(97, u'n'), (98, u'o'), (99, u'p')]
  4. Finally, dict creates a dictionary out of a list of touples just like the one created by zip.
    >>> dict([('a', 'n'), ('b', 'o')])
    {'a': 'n', 'b': 'o'}
    >>> dict(zip(map(ord, u'abc'), u'nop'))
    {97: u'n', 98: u'o', 99: u'p'}

You can apparently use this format in the translate function instead of the object generated by maketrans. This way you can fully support unicode translations but… Well, it won’t really do anything for localized characters such as ąężćó and etc. because they are not part of my rotation table anyway. If you wanted to rot13 Polish or German for example you would have to add the extra character at correct positions. Or use string.letters (which becomes localized when a locale is applied), split it in half and then rot-13 each part.

And of course, this whole discussion is moot because the way you should do rot13 is this:

# in python < 3.x
'text'.encode('rot13')
 
# in python >= 3.x
import codecs
codecs.encode('text', 'rot13')

On the upside, you have just learned about maketrans, dict, map, zip and ord functions and the pitfalls of character mapping and unicode.

This entry was posted in programming and tagged . Bookmark the permalink.



3 Responses to Character mapping must return integer, None or unicode

  1. Chris UNITED STATES Opera Windows says:

    Thanks, you saved me some time–was confounded by this Django error. I implemented your same “lazy half-assed” solution and it is working great in my situation!

    Reply  |  Quote
  2. Paulo UNITED STATES Google Chrome Windows says:

    Wow, i have been reading code from gdata for about 1o hours thinking that was the issue, all it took was to encode parameters :P, thanks a lot man :D

    Reply  |  Quote
  3. x HUNGARY Google Chrome Windows says:

    Great explanation. I had the same problem recently. But the plaintext.encode(‘rot13′) does not work with unicode characters either (at least in 2.7.3).

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>