The Python Tourist

A series of articles on the Python programming language. Many of the articles so far deal with side-cases in the language itself that can bite you at unexpected times.

The Python Tourist #1: Passing Mutable Objects as Default Args

Many modern languages allow for default function arguments. Default arguments can be a great thing - they allow you to use sensible defaults for the common cases, without taking away the power to add more functionality as needed. You can arbitrarily expand a function definition without breaking existing code by providing sensible defaults for the new parameters.

Here is a fun example of how Python will bite you if you try to use a mutable object (like a list, dict, or object instance) as a default argument. The following is an example function I'm trying to write ...

# OK, now I am *INTENDING* to write a routine to do the following:
#
#    Append an item to a list, returning the list.
#    If no list is passed, a new list is created.
#
# Here is my first attempt at implementation ...
def add_item(item, the_list = []):    
    the_list.append(item)
    return the_list


At first glance this appears to do what is intended. If the user doesn't pass a list to append items to, then a new list is started. Or at least that appears to be what will happen. Lets try running it:

# I want to start a new list ...
word_list = add_item('First')

# Add some more words
word_list = add_item('Second', word_list)
word_list = add_item('Third', word_list)

# Start a new list of numbers ...
number_list = add_item(111)

# Add some more numbers
number_list = add_item(222, number_list)
number_list = add_item(333, number_list)

# Now lets see what happened ...
print "word_list ",word_list
print "number_list ",number_list


When you run this, you will get the following output:

word_list  ['First', 'Second', 'Third', 111, 222, 333]
number_list  ['First', 'Second', 'Third', 111, 222, 333]


Whoa! That was unexpected. When you use a mutable object as a default arg, Python creates a single object, and reuses that same object on every call. I find this non-intuitive, personally, but that's the way Python works. I think it has something to do with passing args to lambda functions, but I could be completely wrong.

Anyways, here is the correct way to code the function to acheive the result I want:

# fixed version ...
def add_item(item, the_list = None): # declare default as None, then ...

    # NOW, I can set to a new list if no list is passed
    the_list = the_list or []  
    the_list.append(item)
    return the_list


Now when you run the program you will get the expected result:

word_list  ['First', 'Second', 'Third']
number_list  [111, 222, 333]


Note that not every instance of using a mutable type as a default arg will cause problems. In fact, you'll probably be OK most of the time. The danger is in those times that you aren't expecting trouble, and can't figure out why your code isn't working. I find it tends to happen more with recursive functions.

At any rate, my personal rule of thumb is to never use a mutable type as a default arg, and instead to always us None as shown above, and then set the variable inside the function body (where a new empty object will always be created).

It's simple, neat, takes one line, and can save you a marathon debugging session trying to figure out what is wrong (speaking from painful experience!)


Written in WikklyText.

The Python Tourist #2: Taking Exception

Exception handling in Python is a tremendously useful feature. I grew up on C programming, where, to write a really "correct" program, it seems like you have to spend half your code on mundane error checking.
for(i=0; i<NR; ++i) {
    result = do_thing_1();
    if (result < 0) {
        if (result == IO_ERROR) {
             /* handle error */
        }
        else if (result == API_ERROR) {
            /* handle error */
        }
        else {
            /* handle unknown error */
        }
    }
    result = do_thing_2();
    
    /* sigh ... I have to code another huge error block ... */
    ...

/* finally, the loop ends! */    
}


True, C++ has exception handling, but if you are calling standard C-library functions, then you have to go back to the mundane way.

Rewriting the above mess into an equivalent Python block gives:
try:
    # uninterrupted program logic here, no need to break up
    # the natural flow with error checking
    for i in range(NR):
        do_thing_1()
        do_thing_2()
    
# errors can all be handled out-of-line    
except IOERROR:
    # handle IO error
except API_ERROR:
    # handle API error
except:
    # handle unknown error
    # ** be careful here -- keep reading! **


Notice that I wrote "be careful here" under the last except clause. The "be careful" part is what I want to cover in this article. The topics I'm going to cover are:
  1. Globally catching unhandled errors.
  2. Catching exceptions, with details.
  3. Catching multiple exceptions.
  4. When it is bad/good to use "bare" exceptions.


Globally catching unhandled errors.

There is a special hook, sys.excepthook, that (in my experience) is one of the best debugging tools available in Python. I have a custom module that I always include in my projects that handles this automatically. The bulk of the work is done by a module (in the standard Python library) called cgitb. It is called "cgitb" because its original intent was to show a traceback of errors that occurred in CGI scripts, but it is just as easy to use in any other kind of program. Here is my custom error handling module, if you'd like to cut and paste:
Module errors.py
import sys, cgitb
from datetime import datetime

def catch_errors():
    sys.excepthook = my_except_hook
    
def my_except_hook(etype, evalue, etraceback):
    do_verbose_exception( (etype,evalue,etraceback) )
    
def do_verbose_exception(exc_info=None):
    if exc_info is None:
        exc_info = sys.exc_info()
        
    txt = cgitb.text(exc_info)
    
    d = datetime.now()
    p = (d.year, d.month, d.day, d.hour, d.minute, d.second)        
    filename = "ErrorDump-%d%02d%02d-%02d%02d%02d.txt" % p
                
    open(filename,'w').write(txt)
    print "** EXITING on unhandled exception - See %s" % filename  
    sys.exit(1)
Now, anytime you have a "thinko" error, it will automatically be caught and verbose debugging information will be saved to a file. Here is an example:
"Thinko" bug that will be caught by error handler.
# enable catching of unhandled exceptions
import errors
errors.catch_errors()

def do_thing_1(value):

    # just coding along, not expecting anything to go wrong ...
    for i in range(20):
        print "ratio is %d" % (value/(10-i))

do_thing_1(100)
Now lets see what happens when I run it ...
ratio is 10
ratio is 11
ratio is 12
ratio is 14
ratio is 16
ratio is 20
ratio is 25
ratio is 33
ratio is 50
ratio is 100
** EXITING on unhandled exception - See ErrorDump-20060305-110304.txt
The dumpfile shows what happened, with tons of detail. Notice how it shows the function parameters (do_thing_1(value=100)) as well as the value of the local variables when the error occurred. Normally, you'd have to rerun the test case and step through the loop to see what these values were. Now, immediately after the crash, you have a snapshot of the program state.
Contents of "ErrorDump-20060305-110304.txt"
ZeroDivisionError
Python 2.4.2: /usr/bin/python
Sun Mar  5 11:03:04 2006

A problem occurred in a Python script.  Here is the sequence of
function calls leading up to the error, in the order they occurred.

 /var/www/localhost/htdocs/python/tourist/t.py 
    9         print "ratio is %d" % (value/(10-i))
   10 
   11 do_thing_1(100)
   12 
do_thing_1 = <function do_thing_1>

 /var/www/localhost/htdocs/python/tourist/t.py in do_thing_1(value=100)
    7     # just coding along, not expecting anything to go wrong ...
    8     for i in range(20):
    9         print "ratio is %d" % (value/(10-i))
   10 
   11 do_thing_1(100)
value = 100
i = 10
ZeroDivisionError: integer division or modulo by zero
    __doc__ = 'Second argument to a division or modulo operation was zero.'
    __getitem__ = <bound method ZeroDivisionError.__getitem__ of <exceptions.ZeroDivisionError instance>>
    __init__ = <bound method ZeroDivisionError.__init__ of <exceptions.ZeroDivisionError instance>>
    __module__ = 'exceptions'
    __str__ = <bound method ZeroDivisionError.__str__ of <exceptions.ZeroDivisionError instance>>
    args = ('integer division or modulo by zero',)

The above is a description of an error in a Python program.  Here is
the original traceback:

Traceback (most recent call last):
  File "t.py", line 11, in ?
    do_thing_1(100)
  File "t.py", line 9, in do_thing_1
    print "ratio is %d" % (value/(10-i))
ZeroDivisionError: integer division or modulo by zero
Of course, in a final product, you wouldn't want the program to crash so abruptly. I have a more polished GUI version that I use in situations where an end user might see the error. I just gave a bare-bones version here so you can see the important parts, and can customize the error module to suit your application.

As a beginning Python programmer, I was originally tempted to put try ... except clauses around everything. After a while, I realized that it was much better to only have try .. except clauses in places where there was some sort of state information that needed to be cleaned up or rolled back after the error. Other errors (like the "thinko" cases) are better left to the global handler. You can actually lose information by being too greedy with your except clauses.

That leads nicely into the next topic ...

Catching exceptions, with details.

When you catch an exception as shown below, you are only getting part of the available information:
try:
    ... do some stuff ...
    
except API_ERROR:
    #
    # OK, I know an API_ERROR occurred, but have no other details!
    #
Here is a little example of how to provide and use more information in your except clauses.
class ParseError(Exception):
    """
    Custom exception class to capture details on a
    parsing error:
    
          txt = Will be shown by default exception handler.
          filename,line,col = Where the error occurred.
    """
    def __init__(self, txt, filename, line, col):
        Exception.__init__(self, txt)
        self.filename = filename
        self.line = line
        self.col = col
        
def parsefile(filename):
    
    for line in open(filename,'r'):
        # ... parsing code ...
        
        # Say I find an error in line 20, column 10 ...
        line = 20
        col = 10
        raise ParseError('Parse Error, file=%s line=%d,col=%d' % \
                (filename,line,col), 
                filename, 20, 10)

# If I do nothing, Python will show the 'txt' as the error.
parsefile('t.py')


Output
Traceback (most recent call last):
  File "t.py", line 27, in ?
    parsefile('t.py')
  File "t.py", line 24, in parsefile
    filename, 20, 10)
__main__.ParseError: Parse Error, file=t.py line=20,col=10
As you can see, even without writing a try ... except clause, you are already getting more information from the txt string. Now let's see how to capture the exception and access all of its attributes.
# Catch it so I can access .filename, .line, and .col
try:
    parsefile('t.py')
    
except ParseError, info:
    # Now I can do whatever I want to with the detailed info
    print "CAUGHT! Parse error in %s at line=%d, column=%d" % \
        (info.filename, info.line, info.col)


Output
CAUGHT! Parse error in t.py at line=20, column=10
WARNING
The except clause must be exactly except ParseError, info. If you try to use except (ParseError,info) or except [ParseError,info] it will not work.


Conveniently, that leads us into the next topic ...

Catching multiple exceptions.

Sometimes it is convenient to be able to catch multiple exceptions with a single except clause. In the example below, I'm going to check for bad types being passed to a function, and raise a per-type exception if an error is detected.
"""
As in the previous example, I will place useful info into
the 'txt' parameter to the base Exception class. This way
the caller can see exactly what happened without having
to catch the exception and look at the 'info' parameter.
"""
class ErrNeedList(Exception):
    def __init__(self, parm):
        Exception.__init__(self, "Need a list for '%s'" % parm)
        self.parm = parm
        self.usage = "Need a list"
        
class ErrNeedDict(Exception):
    def __init__(self, parm):
        Exception.__init__(self, "Need a dictionary for '%s'" % parm)
        self.parm = parm
        self.usage = "Need a dictionary"

class ErrNeedString(Exception):
    def __init__(self, parm):
        Exception.__init__(self, "Need a string for '%s'" % parm)
        self.parm = parm
        self.usage = "Need a string"
        
def test_function(a_list, a_dict, a_string):
    # check for type errors
    if not isinstance(a_list, list):
        raise ErrNeedList('a_list')

    if not isinstance(a_dict, dict):
        raise ErrNeedDict('a_dict')

    if not isinstance(a_string, str):
        raise ErrNeedString('a_string')

# cause errors and catch them

try:
    test_function( 1,2,3)
#------------------------------------------------------
# here I can test for all errors at once - since each
# has a .parm and .usage attribute, I can treat them
# the same way
#------------------------------------------------------
except (ErrNeedList, ErrNeedDict, ErrNeedString), info:
    print "CAUGHT API ERROR in parameter: %s - %s" % (info.parm, info.usage)
    
try:
    test_function( [],2,3)
except (ErrNeedList, ErrNeedDict, ErrNeedString), info:
    print "CAUGHT API ERROR in parameter: %s - %s" % (info.parm, info.usage)

try:
    test_function( [],{},3)
except (ErrNeedList, ErrNeedDict, ErrNeedString), info:
    print "CAUGHT API ERROR in parameter: %s - %s" % (info.parm, info.usage)


Output
CAUGHT API ERROR in parameter: a_list - Need a list
CAUGHT API ERROR in parameter: a_dict - Need a dictionary
CAUGHT API ERROR in parameter: a_string - Need a string


In an example like this, where all exceptions have common attributes, it makes sense to derive all exceptions from a single base class. Rewriting the classes to derive from a common class APIError:
"Base class"
class APIError(Exception):
    def __init__(self, txt, parm, usage):
        Exception.__init__(self, txt)
        self.parm = parm
        self.usage = usage

class ErrNeedList(APIError):
    def __init__(self, parm):
        APIError.__init__(self, "Need a list for '%s'" % parm, 
                            parm, "Need a list")
        
class ErrNeedDict(APIError):
    def __init__(self, parm):
        APIError.__init__(self, "Need a dictionary for '%s'" % parm,
                            parm, "Need a dictionary")

class ErrNeedString(APIError):
    def __init__(self, parm):
        APIError.__init__(self, "Need a string for '%s'" % parm,
                            parm, "Need a string")
Now the exceptions can be caught in a more compact way:
try:
    test_function( [],{},3)
    
#    
# Now I can just catch the baseclass, and it will catch
# all subclasses as well!
#
except APIError, info:
    print "CAUGHT API ERROR in parameter: %s - %s" % (info.parm, info.usage)


WARNING
You must use a tuple when catching multiple exceptions, i.e. except (Err1,Err2,Err3). If you try to use a list, i.e. except [Err1,Err2,Err3] it will not catch the exception, and what's worse, Python will not flag it as a syntax error.
Hopefully it is clearer now why you cannot use except (IOError,info) as a substitute for except IOError, info. If you tried the first form, you would be trying to catch an exception of class info, which isn't what was intended (and if info is undefined, Python will raise another exception).

When it is bad/good to use "bare" exceptions.

When I was first learning about exceptions, it seemed like a good idea to write code blocks like this:
try:
    ... do something ...

except:
    print "Got an error!"


This initially seemed robust to me because you are guaranteed to catch all errors in the block. There are three problems with this:
  1. No exception type is specified, so you have no idea what sort of error occurred.
  2. No info parameter is given, so you are throwing away any extra information that was present.
  3. You are catching not only the errors you expect to occur, but are in essense masking out the errors that you didn't expect to occur.

Here is a brief example to demonstrate:
try:    
    # I could get an IOError here if "filein.txt" doesn't exist, 
    # or is not readable.
    fin = open('t.py','r')
    
    # I could get an IOError here if I do not have write-permissions
    # in the current directory, or the disk is out of space.
    fout = open('fileout.txt','w')

    # I don't expect anything bad to happen here ...    
    for line in fin:
        # filter out comment lines
        if re.match('^\s*#$', line):
            continue
    
except:            
    # The only errors I *expect* are IOErrors, so obviously
    # I can say this ... or can I??
    print "A file error occurred."


Now, I run this example and get the following:
A file error occurred.


So I start debugging. Expecting only an IOError, I make a list of what could have happened:
  1. t.py does not exist
  2. t.py is not readable
  3. I cannot create fileout.txt because of insufficient privileges or the disk is out of space

I check and recheck, and there is "no" reason that the code should fail, yet it is telling me that there was a file error. The problem is that I was not specific enough in my except clause, and have (essentially) masked out the real failure.

Rewriting the except clause shows the real problem:
try:    
    # I could get an IOError if "filein.txt"  doesn't exist, 
    # or is not readable.
    fin = open('t.py','r')
    
    # Same here, if I cannot create "fileout.txt"
    fout = open('fileout.txt','w')
    
    for line in fin:
        # filter out comment lines
        if re.match('^\s*#$', line):
            continue

# Only catch what I am PREPARED to handle!
except IOError:            
    print "A file error occurred."


Output
Traceback (most recent call last):
  File "t.py", line 11, in ?
    if re.match('^s*$', line):
NameError: name 're' is not defined
Ah! Of course, I forgot to import to import re before using it.

I think this highlights why you should only catch those specific errors you are prepared to handle, and let the others float up to the top level (where you can catch them, i.e. with the "global hook" described eariler).

Sometimes, unqualified "excepts" are okay!

Now, I'm going to immediately contradict myself and state that there are times when unqualified except clauses are okay, and even desired. One example I run across all the time is in GUI code, when I've set the mouse cursor to a "busy" state before starting a long operation. If you crash in the middle, you don't want to leave the cursor in the busy state forever since that would be confusing to the user. Here is the typical way I code that situation (using wxPython here):
#
# I'm about to perform a long operation, so set the
# cursor to the "busy" (hourglass) state
#
wx.BeginBusyCursor()

try:
    # A long operation begins here ...
    
    ...
    ...

    # when I'm finished, exit the busy state
    wx.EndBusyCursor()
    
except:
    # cancel the busy state - don't leave the user hanging!
    wx.EndBusyCursor()
    
    # re-raise original error to caller
    raise
That final line is critical: When you use a bare raise statement, it will re-raise the original exception, so the error will propogate back to the caller with no loss of information.
WARNING
You do not want to do it like this:
try:
    ... stuff ...
    
except Exception, exc:
    raise exc


As this will cause you to lose information from the original exception.
I've also used unqualified except clauses in situations involving database transactions:
# "pseudo-SQL" code, just to give the idea ...
try:
    # run entire transaction inside of "try"
    sql.run("begin transaction")
    
    sql.run("insert into ...")
    sql.run("insert into ...")
    sql.run("update ...")
    
    sql.run("commit transaction")
    
except:
    # undo any changes on error
    sql.run("rollback transaction")
    
    # propagate original error
    raise


This is very nice because the caller will know that an error has occurred, but doesn't have to worry about the database state because it has already been cleaned up.

One final example of where I find unqualified excepts useful is in threaded programs where I'm locking around a set of global data:
from threading import Lock
DATA_LOCK = Lock()

try:
    DATA_LOCK.acquire()
    
    .. perform operations on global data ...
    
    DATA_LOCK.release()
    
except:
    # unlock on error
    DATA_LOCK.release()
    
    # propagate original error
    raise
Written in WikklyText.

The Python Tourist #3: Forgetting how cmp() works ... no problem!

The cmp() function is the basis for sorting in Python. It has a simple definition but I'm always forgetting the sense of the return values. For reference:

cmp(a, b) returns: Now, cmp() only knows how to sort built-in values, like integers, strings, etc. If you try to sort a list of user-defined objects, you'll find that Python has no idea how to sort them (though it will do something). What you are supposed to do is define a __cmp__ function for your custom classes. Then, when your objects are sorted, Python calls your __cmp__ to tell it how to order them.

Every time I create a class that needs a __cmp__, I'm tempted to go scrambling to the Python docs for a refresher on what the three return values of cmp mean. But really, there is no need to worry about this. You can simply use the cmp() function to do it for you.

I'll demonstrate with an example:
Ultra Verbose Way
#
# I'm going to make a list of people. For each person I will
# store their first and last name, and the state they live in.
#
# For sorting, I want to sort FIRST by state, SECOND by last name,
# and finally by first name.
#
class Person(object):
    def __init__(self, first, last, state):
        self.first = first
        self.last = last
        self.state = state

    # define __str__ so that 'print object' will look good        
    def __str__(self):
        return "%s: %s, %s" % (self.state, self.last, self.first)
        
    # naive __cmp__, where I have to remember what -1, 0, and 1 mean.
    def __cmp__(self, other):
        # sort first by state
        if self.state < other.state:
            return -1
        elif self.state > other.state:
            return 1
        else:
            # state is equal, - sort by last name
            if self.last < other.last:
                return -1
            elif self.last > other.last:
                return 1
            else:
                # state and last name are equal, 
                # sort by first name
                if self.first < other.first:
                    return -1
                elif self.first > other.first:
                    return 1
                else:
                    return 0
            
def show(people):
    for p in people:
        print p
        
people = [
    Person("Tom","Zeelman",'MN'),
    Person("Ozlo","Yannican",'AZ'),
    Person("Mike","Dodger",'AL'),
    Person("Greta","Abington",'CT'),
    Person("Ooolma", "Therrmon",'MS'),
    Person("Bob","Abington",'AL'),
    Person("Erma","Valencio",'AZ'),
    Person("Abe","Abington",'CT'),
    Person("Zeldo","Yannican",'TN'),
    
    ]

print "Original list:"    
print "--------------"
show(people)

people.sort()
print "
Sorted:"
print "--------"
show(people)
This works, but is extremely cumbersome and error prone. However, there is no need to go to all this trouble. All you need to do is to split your object into items that cmp() knows how to handle. In other words, cmp() knows perfectly well how to sort the .first, .last, and .state values, you just have to split them up and pass them in the correct order:
Much better __cmp__
#
# I'm going to make a list of people. For each person I will
# store their first and last name, and the state they live in.
#
# For sorting, I want to sort FIRST by state, SECOND by last name,
# and finally by first name.
#
class Person(object):
    def __init__(self, first, last, state):
        self.first = first
        self.last = last
        self.state = state
        
    # define __str__ so that 'print object' will look good
    def __str__(self):
        return "%s: %s, %s" % (self.state, self.last, self.first)

    # much better - I don't care what -1, 0 and 1 mean.
    # due to boolean short-circuit logic, the "or" sequence will
    # return the cmp() value of the first non-equal piece
    def __cmp__(self, other):
        return cmp(self.state, other.state,) or 
                cmp(self.last,other.last) or 
                cmp(self.first,other.first)
    
            
def show(people):
    for p in people:
        print p
        
people = [
    Person("Tom","Zeelman",'MN'),
    Person("Ozlo","Yannican",'AZ'),
    Person("Mike","Dodger",'AL'),
    Person("Greta","Abington",'CT'),
    Person("Ooolma", "Therrmon",'MS'),
    Person("Bob","Abington",'AL'),
    Person("Erma","Valencio",'AZ'),
    Person("Abe","Abington",'CT'),
    Person("Zeldo","Yannican",'TN'),
    
    ]

print "Original list:"    
show(people)

people.sort()
print "
Sorted:"
show(people)
True, the "or" short-circuit logic relies on the fact that cmp(a,b) == 0 when a == b, but that's well-defined and gives much cleaner code in comparison to the bloated mess of doing it yourself.
Written in WikklyText.

The Python Tourist #4: None, empty, and nothing.

In learning Python, I had read in several places that you should always use a test like if obj is None if you wanted to check for the None value. For some reason, I tend to ignore blanket statements that are presented without supporting rationale. If the underlying rationale isn't stated, I generally assume it's some sort of esoteric thing that doesn't really matter. Only after it bites me and I can understand the logic will I pay attention.

Here are a couple of cases where not explicitly testing for None has gotten me into trouble. Maybe these can help someone else avoid the same headaches.

Look at this sample:
Simple parsing function
def parse_file(filename):
    """
    Parse a file, returning a list of tags.
    Returns None on error.
    """
    
    f = open(filename,'r')
    
    if not check_format(f):
        return None  # file is wrong format

    tags = []
    
    for line in f:
        tags.append( parse_line(line) )
        
    return tags
    
if parse_file(filename):
    print "Parsed OK!"
else:
    print "** ERROR **"
Looks correct enough. The if parse_file(...) should be True if I get a list, and the else should be True if I get None. There is one little problem though. Look at the following snippet:
The boolean value of 'empty'
if []: print "True"
if not []: print "False"
This will always print "False". Coming from a C background, I want to think of None as "the absence of something", like a NULL pointer. Unfortunately, Python treats empty objects as False values. To me, an empty object is still something as opposed to None which (I think) should be nothing, so this is confusing.

I think the thing to do is recognize that this function has three exit states:
  1. None, indicating an error.
  2. An empty list, indicating an empty file.
  3. A non-empty list, holding tags.

The correct test is then:
Explicitly test for None
tags = parse_file(filename)
if tags is None:
    print "** ERROR **"
elif len(tags) == 0:
    print "Empty file"
else:
    print "OK!"
We can make this worse and give it four exit states, with the same functionality:
Now with four exit states ...
def parse_file(filename):
    """
    Parse a file, returning a list of tags.
    Returns None on error.
    """
    
    f = open(filename,'r')
    
    if not check_format(f):
        return None  # file is wrong format

    tags = []
    
    for line in f:
        # look for special end-of-file tag
        if end_of_file(l):
            return tags
        else:
            tags.append( parse_line(line) )
Although it looks like the same logic, I've introduced a (sort of) hidden fourth state: If the "end-of-file" tag isn't found, the for loop will exit without returning a value. When you don't return a value, None is returned. For example:
Not returning a value == None
def foo():
    pass
    
print "The value is %s" % foo()

Prints "The value is None".
Of course, the code sample above is buggy, I shouldn't let it fall out of the loop. Once again, my C background gave me a false sense of security. A C compiler will tell you when you exit a routine in different ways (with and without a return value), so things like this won't happen if you pay attention to the compiler warnings. The dynamic nature of Python means that it really can't do that kind of checking, since it would be impractical to run through every branch inside the function to see if the return values match.

Anyways, disregarding the buggy code for the moment, recognize that the above function has four distinct exit states:
  1. None, indicating an error.
  2. An empty list, indicating an empty file.
  3. A non-empty list, holding tags.
  4. None, indicating no return value.

The first and last cases bother me a little bit. I don't like that None can have two meanings:
  1. The value None.
  2. The absence of a value.

In my "C thinking" of the first example, I was assuming None meant "the absence of a value", so was surprised to find that an empty list was (apparently) the same as nothing. Of course, that isn't the case, it's just that an empty list evaluates to the same boolean value as None.

I appreciate that Python is a practical language. An impractical language could "fix" this by forcing you to only use (exactly) True or False in boolean expressions. Python tends to loosen the rules as much as practical, without going overboard. (Some languages like perl go overboard in their coercion rules, which I think leads to even harder to understand code.). I wish that empty lists didn't evaluate to False, but that's the way it is, so you just have to keep it in mind.
NOTE
Normally, if you don't like the way an object behaves, you can subclass it and override the behavior you don't like. In the case of boolean operators, there doesn't seem to be a way to do that. If L is a list, the expression if L: ... calls L.__len__(). Therefore an empty list returns 0, which is False. Trying to override this would break other list functionality. There is a draft proposal, PEP 335: Overloadable Boolean Operators, but even this doesn't seem to allow you to override the case of if L: ..., only the case if not L: ....
One final note: The correct test for None is if obj is None, not if obj == None. The reason not to use == is that an object can define its own __eq__ function, and might implement __eq__ in a way that would cause it to be equal to (even if not the same as) None. The "is" operator means "the same object", so is the more correct test here.
Written in WikklyText.

The Python Tourist #5: Replacing sys.version_info with pyconfig

If you've spent any time writing Python code that is meant to be portable across multiple versions of Python, you've most likely written a few statements like this:
Using sys.version_info
# am I running on Python 2.2 and up?
if sys.version_info[0] >= 2 and sys.version_info[1] >= 2:
    # do stuff for Python 2.2
else:
    # do stuff for earlier versions
The problem I see with this is that it is ultra-verbose, and doesn't actually tell you what capability you require here. Although you can chop down the verbosity with a statement like:
Better, but ...
if sys.version_info[:2] >= (2,2):
    ...
That doesn't take care of the fact that you are relying on a hardcoded version number. There are a few downsides to this: For the sake of argument, lets say that IronPython has a different feature set than CPython and Jython. Picture writing code like this:
Check for implementation as well as version
if is_CPython():
    if sys.version_info[:2] == (2,1):
        # do 2.1 stuff
    elif sys.version_info[:2] >= 2.2:
        # do 2.2 stuff
        
elif is_Jython():
    # do the Jython version checking ...
    
elif is_IronPython():
    # and more version checking ...
After a while, it begins to feel like writing C-style #ifdefs instead of Python.

The dynamic nature of Python means you can do all sort of neat introspective things, including introspection of runtime capabilities. Want to know if the current Python understands a particular piece of code? Run it and see!

There is a little module called pyconfig that I wrote while working on xml.pickle (part of Gnosis_Utils). It is sort of like an autoconf for Python, except it works at runtime. It is bundled with Gnosis_Utils (since it uses it internally), but can be used as a stand-alone module, as there are no external dependencies.

The pyconfig module provides a set of prewritten tests to let you check for capabilities of the Python interpreter, instead of relying on version numbers.

Compare the following two code segments:
Without pyconfig
# need generator expressions
if sys.version_info[:2] >= (2,4):
    # do something with generator expressions ...

# are True/False builtin?
if not (sys.version_info[:2] >= (2,2)):
    # define my own True/False

# is 'enumerate()' available?
if sys.version_info[:2] >= (2,3):
    # do something with enumerate()    
Compare to the pyconfig-based code:
With pyconfig
from gnosis.pyconfig import pyconfig

# need generator expressions
if pyconfig.Have_GeneratorExpressions():
    # do something with generator expressions ...

# are True/False builtin?
if not pyconfig.Have_TrueFalse():
    # define my own True/False

# is 'enumerate()' available?
if pyconfig.Have_Enumerate():
    # do something with enumerate()    
In the second case, it is clear exactly what capability is needed. Also, the code is now robust across any nonstandard versions of Python that it might be running on.

If you import pyconfig as from pyconfig import pyconfig, then all test results will be automatically cached. This allows you to use the tests inline with a minimum speed penalty.

pyconfig is written as set of small tests, with the reusable parts modularized. This makes it very easy to write any new tests you need. If nothing else, the source code to pyconfig is an interesting historical reference of the various PEPs that have been included over the evolution of Python.

Getting pyconfig

As mentioned, pyconfig comes bundled with Gnosis_Utils: Gnosis_Utils

Or if you prefer, you can grab it as a separate module: pyconfig.py
NOTE
The version bundled with Gnosis_Utils is the "stable" version. The standalone is the latest snapshot I've uploaded here, and may have newer features and/or bugs.
Written in WikklyText.