classification
Title: Remove cmp parameter to list.sort() and builtin.sorted()
Type: Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: LeWiemann, dtorp, mark.dickinson, metal, r.david.murray, rhettinger, tixxit
Priority: normal Keywords:

Created on 2008-01-09 00:02 by gvanrossum, last changed 2010-07-20 20:34 by metal. This issue is now closed.

Files
File name Uploaded Description Edit
nocmp.diff rhettinger, 2008-01-14 00:22 Remove cmp from C files.
Messages (26)
msg59575 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-09 00:02
We should either change the API so you can pass in a '<' comparator, or
get rid of it completely (since key=... takes care of all use cases I
can think of).
msg59578 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-01-09 00:37
I support removal;  however, there is an uncommon corner-case that is
well served by __cmp__ where a primary key is sorted ascending and a
secondary key is sorted descending -- that case is a PITA with the key=
function because you need a way to invert the sense of the reversed
input (there are tricks for this but they are obscure).
msg59579 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-01-09 00:42
Forgot to mention, the easy work-around is to do two consecutive sorts
and take advantage of the guaranteed stability:

  l.sort(key=secondary, reverse=True)
  l.sort(key=primary)
msg59867 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-01-13 22:30
Let's do this.  It is a nice simplification and makes the sort tools
easier to learn and use.
msg59877 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-01-14 00:22
Patch attached for the C files making them much cleaner and a bit
faster.   Still needs the related tests to be deleted and the docs
updated.
msg59885 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-14 02:33
Cool!  Doesn't it feel good to rip out stuff? :-)

I do hope that you'll make sure all tests pass (-uall) before submitting
this.
msg59937 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-01-14 23:24
Yes, it does feel great.  The code is cleaner and faster.  The API is
simple and matches all the other key= functions in
min/max/nsmallest/nlargest/groupby.

After more thought, I would like to make one more change and require the
arguments to be keywords such as sort(key=str.lower) but not
sort(str.lower).

The issue is that the cmp= interface has been around so long that it is
ingrained into our thinking and in our code.  Having to write-out the
keyword makes the intent explicit and will avoid accidently passing in a
cmp= function when a key= function was intended.  In Py3.1, the
restriction could be relaxed and l.sort(f) could be accepted for
l.sort(key=f).

For the 2-to-3 tool, I wrote a converter that automatically transitions
code currently using a custom compare function:

2.6 code:  s.sort(cmp=lambda p, q: cmp(p.lower(), q.lower()))
3.0 code:  s.sort(key=CmpToKey(lambda p, q: cmp(p.lower(), q.lower())))

Ideally, the automatcic conversion would be accompanied by a suggestion
to manually rewrite to something like:

3.0 code:   s.sort(key=str.lower)

--- converter code ---

def CmpToKey(mycmp):
    'Convert a cmp= function into a key= function'
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __cmp__(self, other):
            return mycmp(self.obj, other.obj)
    return K
msg59939 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-14 23:35
> After more thought, I would like to make one more change and require the
> arguments to be keywords such as sort(key=str.lower) but not
> sort(str.lower).

Works for me. (To be honest I thought key was already required to be a
keyword. :-)
msg61877 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-01-30 20:15
Checked-in rev 60453.
msg62476 - (view) Author: Lea Wiemann (LeWiemann) Date: 2008-02-17 01:45
Is this really necessary?

I see that the sorting code gets a little simpler, but I believe that 
there are valid use cases for cmp out there, where using a key would at 
least be cumbersome.  So why remove it when it's useful?

For instance, if you have an algorithm to determine the order in which 
any two elements should occur (and for some reason the algorithm 
satisfies transitivity) then it's usable as the cmp function, but 
turning it into a key function may be complicated (and adversly affect 
performance); you might end up having to map each element to a number 
or tuple describing the ordering properties of the element, which can 
be non-trivial.  Besides, it can also be non-obvious.
msg62477 - (view) Author: Lea Wiemann (LeWiemann) Date: 2008-02-17 01:49
"Non-obvious" to the novice that this technique can be used, and non-
obvious to the reader of the program.

I'm envisioning key functions that return long sequences of -1/0/1 
integers, or numbers between 0 and 2**50...  Not good.

Instead of removing cmp, I'd suggest simply placing a note in the 
documentation saying that key is preferred over cmp, to encourage 
readability.
msg62480 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-02-17 05:27
FWIW, an object with a complex element-to-element comparison can define 
an __lt__() method and have it work with sort, min, max, etc.
msg62481 - (view) Author: Lea Wiemann (LeWiemann) Date: 2008-02-17 05:33
I know, but I don't always want to define the comparison on the object 
itself, because it's not an intrinsic feature of the object.  It's just 
the way I happen to sort it right now.  (That's especially likely if 
the ordering algorithm is complex.)
msg95973 - (view) Author: Tom Switzer (tixxit) Date: 2009-12-04 22:26
I am not sure I understand the reasoning behind removing the cmp
parameter (and agree with Lea Wiemann). Trying to wedge a proper
comparison into the key parameter is clumsy and unreadable (as can be
seen in the 2to3 example above). The intrinsic ordering on objects does
not necessarily match up with the way you want to sort them. For
example, a natural intrinsic order on 2 points in 2d is lexicographical,
however you often want to sort by angular order relative to some other
point instead. Clearly this can never be put in __cmp__ or __lt__,
because the sorted order is relative to some other unknown point. Trying
to do this with the key function doesn't make sense; it would not be
clear you are sorting by angular order and you'd have to instantiate a
bunch of wrapper objects just to do basic sorting. Another quick example
would be sorting hyperplanes by intersection on a ray. Sorting points
along a direction given by a vector.

I understand removing redundant features from a language, but I just
can't see how key replaces this functionality in a readable or efficient
way. This highlights an important class of cases (since it was mentioned
that none could be thought of) in which we wish to make comparisons
between values where a comparison (<, > or ==) is more numerically
sound, more efficient, or the only option (perhaps the ordering is
defined explicitly) then computing the exact values (eg. angle). As far
as it seems, the only way to do this with key is by following the
example given and creating a class solely to wrap each object that
overrides __cmp__, which is certainly non-obvious (ie. there is no one,
obvious way to do it).
msg95975 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-12-04 22:49
Can someone provide a code sample to make this argument more 
understandable for those of us who don't compare points by angular order 
for a living... :-)

I'm not sure what the 2to3 example (I presume you mean msg59937) shows 
except that conversion from a cmp function to a key function may require 
you to actually think...

Also, for all of you asking for cmp back, I hope you realize that 
sorting N values using a custom cmp function makes about N log N calls 
calls to cmp, whereas using a custom key calls the key function only N 
times.  This means that even if your cmp function is faster than the 
best key function you can write, the advantage is lost as N increases 
(which is just where you'd like it to matter most :-).
msg95982 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-12-05 07:04
FWIW, we had a long discussion on comp.lang.python and the net result
was that no use cases were found that required a cmp function.  One
complex case (sorting recursive tree structures) at first appeared to
need a cmp-function but was found to be simpler and faster using a
key-function.  The net result of the conversation was the feeling that
people who have grown-up using cmp-functions in either Python, C or some
other language feel like they've lost something but really haven't.  In
contrast, people who use SQL or spreadsheet database tools find that key
functions come naturally since neither supports cmp-functions, instead
preferring the user to specify primary and secondary key functions.   

Also, it was pointed-out the removal of cmp-functions in sorted() and
list.sort() was part of a larger effort to remove all forms of cmp from
the whole language (i.e. the builtin cmp function is gone and so it the
__cmp__ magic method).  Rich comparisons have completely supplanted all
uses of cmp-functions in the language as a whole -- having multiple ways
to do it was confusing.

In converting code from 2-to-3, we have found two sticky cases.

The first occurs when an API had exposed cmp functions to the end-user
(for example, unittest.getTestCaseNames() and unittest.makeSuite() have
an optional sortUsing parameter that allows the user to specify a
cmp-function).  To support that use case (so that end-user API's would
not have to be changed), we added a CmpToKey() tool which automatically
converts cmp-functions to key functions.  This tool is referenced in the
docs and it could be added to the 2-to-3 converter.

The second case occurs when a primary key is sorted ascending and a
secondary key is sorted descending.  The technique for that is to take
advantage of sort stability and do two sorts:

   s.sort(key=secondary, reverse=True)
   s.sort(key=primary)   
   # now sorted by primary ascending, secondary descending

That technique is going to be documented in an update of the sorting
how-to.  It doesn't seem to arise much in practice and the cmp function
equivalent seems to be harder for beginners to write (though at least it
can be done with a single cmp-function and a single sort).
msg96024 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-12-06 11:57
Tom, I think I'm missing your point:  all three of the examples you give 
seem like perfect candidates for a key-based sort rather than a 
comparison-based one.  For the first example, couldn't you do something 
like:

def direction(pt1, pt2):
    """angle of line segment from point 1 to point 2"""
    return atan2(pt2.y - pt1.y, pt2.x - pt1.x)

my_points.sort(key=lambda pt: direction(reference_pt, pt))

? How would having a cmp keyword argument make this any easier or 
simpler?


Here's the best example I can think of for which key-based sorting is 
problematic:  imagine that the Decimal type doesn't exist, and that you 
have triples (sign, coefficient_string, exponent) representing 
arbitrary-precision base 10 floating-point numbers.  It's fairly tricky 
to come up with a key function that maps these triples to some existing 
ordered type, so that they can be sorted in natural numerical order.  
The problem lies in the way that the sort order for the coefficient 
string and exponent depends on the value of the sign (one way for 
positive numbers, reversed for negative numbers). But it's not a big 
deal to define a wrapper for cases like this.
msg96026 - (view) Author: Tom Switzer (tixxit) Date: 2009-12-06 16:24
Mark: I think your example actually helps illustrate my point. My point was
that computing the angle directly is less efficient or not as nice
numerically. For instance, if you are sorting points by angle relative to an
extreme point you could do something like this (a first step in the Graham
Scan) - be prepared for non-Python 3.0 code ;)

from functools import partial
from random import random

def turn(p, q, r):
    """Return -1, 0, or 1 if p,q,r forms a right, straight, or left turn
respectively."""
    return cmp((q[0] - p[0])*(r[1] - p[1]) - (r[0] - p[0])*(q[1] - p[1]), 0)

pts = [(random(), random()) for i in xrange(10)]
i = min(xrange(len(pts)), key=lambda i: pts[i])
p = pts.pop(i)
pts.sort(cmp=partial(turn, p))

Here our comparison function requires only 2 multiplications and 5
additions/subtractions. This function is nice especially if you are using
arbitrary precision or rational numbers as it is exact.

On Sun, Dec 6, 2009 at 6:57 AM, Mark Dickinson <report@bugs.python.org>wrote:

>
> Mark Dickinson <dickinsm@gmail.com> added the comment:
>
> Tom, I think I'm missing your point:  all three of the examples you give
> seem like perfect candidates for a key-based sort rather than a
> comparison-based one.  For the first example, couldn't you do something
> like:
>
> def direction(pt1, pt2):
>    """angle of line segment from point 1 to point 2"""
>    return atan2(pt2.y - pt1.y, pt2.x - pt1.x)
>
> my_points.sort(key=lambda pt: direction(reference_pt, pt))
>
> ? How would having a cmp keyword argument make this any easier or
> simpler?
>
>
> Here's the best example I can think of for which key-based sorting is
> problematic:  imagine that the Decimal type doesn't exist, and that you
> have triples (sign, coefficient_string, exponent) representing
> arbitrary-precision base 10 floating-point numbers.  It's fairly tricky
> to come up with a key function that maps these triples to some existing
> ordered type, so that they can be sorted in natural numerical order.
> The problem lies in the way that the sort order for the coefficient
> string and exponent depends on the value of the sign (one way for
> positive numbers, reversed for negative numbers). But it's not a big
> deal to define a wrapper for cases like this.
>
> ----------
> nosy: +mark.dickinson
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1771>
> _______________________________________
>
msg96034 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-12-06 17:27
Ah.  Thanks for the explanation;  I see your point.  I guess you do just 
have to use a CmpToKey-type wrapper for this sort of comparison.

Though for this *particular* example, it seems to me that you could still use a key function

lambda q: (q[0] - p[0])/(q[1]-p[1]),

which would be even more efficient.  This is assuming that your extreme 
point p has minimal second coordinate amongst points being sorted, which 
as I understand it is how the Graham Scan typically starts.  There's the 
minor difficulty of dealing with points with the *same* second coordinate 
as p, where I guess the key value should be some form of +Infinity.  I can 
also see that this might be problematic if you're working with a type 
that's exact for addition and multiplication but not for division (e.g., 
Decimal with unbounded precision).

It would be interesting to see some relative timings for the sort stage of 
the Graham scan (on a million random points, say): key function versus cmp 
function.
msg96058 - (view) Author: Tom Switzer (tixxit) Date: 2009-12-07 15:46
If the equal min y-coords are handled, I think it'd be quicker too. As Guido
noted, O(n) function calls is better then O(n log n) =] Though the general
case is still unhandled. And, though it doesn't help my case, the Graham
Scan can also be performed on points sorted lexicographically too, by
constructing the upper & lower hull separately, hehe.

Now, I understand cmp on the whole was removed from the language. Using
__lt__, __eq__, etc. really is more natural. However, having an explicit cmp
function for sorting makes sense to me. At the very least, it is more
obvious and natural for some problems - though I respect that using a key
func. is often faster. In some rare (though "rare" is very subjective) cases
it is required; packing a cmp function into __cmp__ in a wrapper object is
really just a hard-to-read cmp function and highlights the need for cmp. I
would actually love to see it added for min/max too actually, since I find I
often use a simple reduce function in place of a min(lst, cmp=...).
Enforcing proper comparisons (<, >, ==, etc) makes sense, but would having
the cmp function live, so to speak, in sorting really be that bad? Just
inform the user in the docs that key is preferred and often faster.
msg96062 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-12-07 18:10
Python's sort implementation is carefully written to only use the "<" 
comparison, ever.  So a cmp really isn't the most natural way to specify 
a comparison.  (This should really be documented somewhere -- I know know 
it because Tim Peters & I shared an office while he did most of the work 
on Python's sort.)
msg102019 - (view) Author: METAL XXX (metal) Date: 2010-03-31 16:36
I have a tree:

   A
  / \
 B   C
/ \
D E

which is implemented as a dict

tree = {
  'A': set(['B', 'C']),
  'B': set(['D', 'E']), 
  'C': set(),
  'D': set(),
  'E': set(),
}

I want to sort the nodes.

and I don't know how to write a key function for sort() in this situation

so I write a cmp function:

sorted(tree, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0)

and it gets ['A', 'C', 'B', 'E', 'D'].

how to convert cmp to key really confused me and it surely need more typing time.

so I disagree the removal
msg102022 - (view) Author: David Albert Torpey (dtorp) Date: 2010-03-31 17:40
> sorted(tree, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0)
>
> and it gets ['A', 'C', 'B', 'E', 'D'].

That cmp function is nonsense and isn't even close to being correct:

>>> from random import shuffle
>>> for i in range(10):
... 	t = list(tree)
... 	shuffle(t)
... 	print sorted(t, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0)
	
['E', 'C', 'B', 'D', 'A']
['A', 'D', 'C', 'B', 'E']
['C', 'B', 'E', 'D', 'A']
['E', 'D', 'A', 'C', 'B']
['A', 'B', 'D', 'E', 'C']
['D', 'A', 'E', 'C', 'B']
['C', 'D', 'A', 'B', 'E']
['A', 'C', 'B', 'D', 'E']
['A', 'C', 'B', 'E', 'D']
['A', 'C', 'B', 'D', 'E']

> how to convert cmp to key really confused
> me and it surely need more typing time.

Just cut and paste the recipe.  Simple.
msg102030 - (view) Author: METAL XXX (metal) Date: 2010-03-31 19:14
Sorry I ripped the code from a mess and I forget the tree is "leaflized" as

tree = {
  'A': set(['B', 'C', 'D', 'E']),
  'B': set(['D', 'E']), 
  'C': set(),
  'D': set(),
  'E': set(),
}

I don't want to talk about the actual problem. I think sometime 
it's hard to give an "absolute" weight value as key, for this example,
is the relationship in graph. Then user have to "Copy and paste the recipe" to get a cmp function which should be already there. We are
all adults here, why don't SIMPLELY tell the newbie don't use cmp() use key() unless you know what you are doing.


Thanks for reply.
msg102031 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-03-31 20:57
cmp is gone.  It's chances of coming back are close enough to zero that an assertAlmostEqual test will pass :).  The rest of the discussion should move to one of the general python lists.
msg110969 - (view) Author: METAL XXX (metal) Date: 2010-07-20 20:34
Shame on me, after a long time I realized the problem referenced in my old post (http://bugs.python.org/msg102019) was actually topological sorting. It can't be done by Python's sort(), which doesn't support partial order. Trying to use cmp parameter is completely wrong. And cmp would mislead people like me to sort a partial order, evil! Now I'm absolutely agree with gone of cmp, viva Raymond Hettinger!
History
Date User Action Args
2010-07-20 20:34:50metalsetmessages: + msg110969
2010-03-31 20:57:43r.david.murraysetnosy: + r.david.murray
messages: + msg102031
2010-03-31 19:14:28metalsetmessages: + msg102030
2010-03-31 18:47:30gvanrossumsetnosy: - gvanrossum
2010-03-31 17:40:05dtorpsetnosy: + dtorp
messages: + msg102022
2010-03-31 16:36:58metalsetnosy: + metal
messages: + msg102019
2009-12-07 18:10:28gvanrossumsetmessages: + msg96062
2009-12-07 18:06:08gvanrossumsetfiles: - unnamed
2009-12-07 18:06:04gvanrossumsetfiles: - unnamed
2009-12-07 15:46:20tixxitsetfiles: + unnamed

messages: + msg96058
2009-12-06 17:27:30mark.dickinsonsetmessages: + msg96034
2009-12-06 16:24:41tixxitsetfiles: + unnamed

messages: + msg96026
2009-12-06 11:57:29mark.dickinsonsetnosy: + mark.dickinson
messages: + msg96024
2009-12-05 07:04:48rhettingersetmessages: + msg95982
2009-12-04 22:49:38gvanrossumsetmessages: + msg95975
2009-12-04 22:26:56tixxitsetnosy: + tixxit
messages: + msg95973
2008-02-17 05:33:26LeWiemannsetmessages: + msg62481
2008-02-17 05:27:28rhettingersetmessages: + msg62480
2008-02-17 01:49:36LeWiemannsetmessages: + msg62477
2008-02-17 01:45:56LeWiemannsetnosy: + LeWiemann
messages: + msg62476
2008-01-30 20:16:11rhettingersetstatus: open -> closed
2008-01-30 20:15:44rhettingersetmessages: + msg61877
2008-01-14 23:35:45gvanrossumsetmessages: + msg59939
2008-01-14 23:24:37rhettingersetmessages: + msg59937
2008-01-14 02:33:54gvanrossumsetmessages: + msg59885
2008-01-14 00:22:26rhettingersetfiles: + nocmp.diff
messages: + msg59877
2008-01-13 22:30:36rhettingersetassignee: rhettinger
resolution: accepted
messages: + msg59867
2008-01-09 00:42:20rhettingersetmessages: + msg59579
2008-01-09 00:37:52rhettingersetnosy: + rhettinger
messages: + msg59578
2008-01-09 00:02:16gvanrossumsetcomponents: + Interpreter Core
2008-01-09 00:02:05gvanrossumcreate