The Python Tourist #3: Forgetting how cmp() works ... no problem!

The cmp() function is the basis for sorting in Python. It has a simple definition but I'm always forgetting the sense of the return values. For reference:

cmp(a, b) returns:
  • -1, if a < b
  • 0 if a == b
  • 1 if a > b

Now, cmp() only knows how to sort built-in values, like integers, strings, etc. If you try to sort a list of user-defined objects, you'll find that Python has no idea how to sort them (though it will do something). What you are supposed to do is define a __cmp__ function for your custom classes. Then, when your objects are sorted, Python calls your __cmp__ to tell it how to order them.

Every time I create a class that needs a __cmp__, I'm tempted to go scrambling to the Python docs for a refresher on what the three return values of cmp mean. But really, there is no need to worry about this. You can simply use the cmp() function to do it for you.

I'll demonstrate with an example:
Ultra Verbose Way
#
# I'm going to make a list of people. For each person I will
# store their first and last name, and the state they live in.
#
# For sorting, I want to sort FIRST by state, SECOND by last name,
# and finally by first name.
#
class Person(object):
    def __init__(self, first, last, state):
        self.first = first
        self.last = last
        self.state = state

    # define __str__ so that 'print object' will look good        
    def __str__(self):
        return "%s: %s, %s" % (self.state, self.last, self.first)
        
    # naive __cmp__, where I have to remember what -1, 0, and 1 mean.
    def __cmp__(self, other):
        # sort first by state
        if self.state < other.state:
            return -1
        elif self.state > other.state:
            return 1
        else:
            # state is equal, - sort by last name
            if self.last < other.last:
                return -1
            elif self.last > other.last:
                return 1
            else:
                # state and last name are equal, 
                # sort by first name
                if self.first < other.first:
                    return -1
                elif self.first > other.first:
                    return 1
                else:
                    return 0
            
def show(people):
    for p in people:
        print p
        
people = [
    Person("Tom","Zeelman",'MN'),
    Person("Ozlo","Yannican",'AZ'),
    Person("Mike","Dodger",'AL'),
    Person("Greta","Abington",'CT'),
    Person("Ooolma", "Therrmon",'MS'),
    Person("Bob","Abington",'AL'),
    Person("Erma","Valencio",'AZ'),
    Person("Abe","Abington",'CT'),
    Person("Zeldo","Yannican",'TN'),
    
    ]

print "Original list:"    
print "--------------"
show(people)

people.sort()
print "
Sorted:"
print "--------"
show(people)
This works, but is extremely cumbersome and error prone. However, there is no need to go to all this trouble. All you need to do is to split your object into items that cmp() knows how to handle. In other words, cmp() knows perfectly well how to sort the .first, .last, and .state values, you just have to split them up and pass them in the correct order:
Much better __cmp__
#
# I'm going to make a list of people. For each person I will
# store their first and last name, and the state they live in.
#
# For sorting, I want to sort FIRST by state, SECOND by last name,
# and finally by first name.
#
class Person(object):
    def __init__(self, first, last, state):
        self.first = first
        self.last = last
        self.state = state
        
    # define __str__ so that 'print object' will look good
    def __str__(self):
        return "%s: %s, %s" % (self.state, self.last, self.first)

    # much better - I don't care what -1, 0 and 1 mean.
    # due to boolean short-circuit logic, the "or" sequence will
    # return the cmp() value of the first non-equal piece
    def __cmp__(self, other):
        return cmp(self.state, other.state,) or 
                cmp(self.last,other.last) or 
                cmp(self.first,other.first)
    
            
def show(people):
    for p in people:
        print p
        
people = [
    Person("Tom","Zeelman",'MN'),
    Person("Ozlo","Yannican",'AZ'),
    Person("Mike","Dodger",'AL'),
    Person("Greta","Abington",'CT'),
    Person("Ooolma", "Therrmon",'MS'),
    Person("Bob","Abington",'AL'),
    Person("Erma","Valencio",'AZ'),
    Person("Abe","Abington",'CT'),
    Person("Zeldo","Yannican",'TN'),
    
    ]

print "Original list:"    
show(people)

people.sort()
print "
Sorted:"
show(people)
True, the "or" short-circuit logic relies on the fact that cmp(a,b) == 0 when a == b, but that's well-defined and gives much cleaner code in comparison to the bloated mess of doing it yourself.
Written in WikklyText.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

return cmp(self.state,

        return cmp(self.state, other.state,) or 
                cmp(self.last,other.last) or 
                cmp(self.first,other.first)
could be written as
        return cmp((self.state, self.last, self.first), 
                 (other.state, other.last, other.first))
without relying on "or trickery".

Good point, thanks. For some

Good point, thanks. For some reason I've never liked comparing tuples ... my C/C++ upbringing is holding me back again :-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.