Issue 1771: Remove cmp parameter to list.sort() and builtin.sorted()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/46104

classification

Title:	Remove cmp parameter to list.sort() and builtin.sorted()
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 3.0

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	LeWiemann, dtorp, mark.dickinson, metal, r.david.murray, rhettinger, tixxit
Priority:	normal	Keywords:

Created on 2008-01-09 00:02 by gvanrossum, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
nocmp.diff	rhettinger, 2008-01-14 00:22	Remove cmp from C files.

Messages (26)
msg59575 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-01-09 00:02
We should either change the API so you can pass in a '<' comparator, or get rid of it completely (since key=... takes care of all use cases I can think of).
msg59578 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-01-09 00:37
I support removal; however, there is an uncommon corner-case that is well served by __cmp__ where a primary key is sorted ascending and a secondary key is sorted descending -- that case is a PITA with the key= function because you need a way to invert the sense of the reversed input (there are tricks for this but they are obscure).
msg59579 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-01-09 00:42
Forgot to mention, the easy work-around is to do two consecutive sorts and take advantage of the guaranteed stability: l.sort(key=secondary, reverse=True) l.sort(key=primary)
msg59867 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-01-13 22:30
Let's do this. It is a nice simplification and makes the sort tools easier to learn and use.
msg59877 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-01-14 00:22
Patch attached for the C files making them much cleaner and a bit faster. Still needs the related tests to be deleted and the docs updated.
msg59885 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-01-14 02:33
Cool! Doesn't it feel good to rip out stuff? :-) I do hope that you'll make sure all tests pass (-uall) before submitting this.
msg59937 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-01-14 23:24
Yes, it does feel great. The code is cleaner and faster. The API is simple and matches all the other key= functions in min/max/nsmallest/nlargest/groupby. After more thought, I would like to make one more change and require the arguments to be keywords such as sort(key=str.lower) but not sort(str.lower). The issue is that the cmp= interface has been around so long that it is ingrained into our thinking and in our code. Having to write-out the keyword makes the intent explicit and will avoid accidently passing in a cmp= function when a key= function was intended. In Py3.1, the restriction could be relaxed and l.sort(f) could be accepted for l.sort(key=f). For the 2-to-3 tool, I wrote a converter that automatically transitions code currently using a custom compare function: 2.6 code: s.sort(cmp=lambda p, q: cmp(p.lower(), q.lower())) 3.0 code: s.sort(key=CmpToKey(lambda p, q: cmp(p.lower(), q.lower()))) Ideally, the automatcic conversion would be accompanied by a suggestion to manually rewrite to something like: 3.0 code: s.sort(key=str.lower) --- converter code --- def CmpToKey(mycmp): 'Convert a cmp= function into a key= function' class K(object): def __init__(self, obj, *args): self.obj = obj def __cmp__(self, other): return mycmp(self.obj, other.obj) return K
msg59939 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-01-14 23:35
> After more thought, I would like to make one more change and require the > arguments to be keywords such as sort(key=str.lower) but not > sort(str.lower). Works for me. (To be honest I thought key was already required to be a keyword. :-)
msg61877 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-01-30 20:15
Checked-in rev 60453.
msg62476 - (view)	Author: Lea Wiemann (LeWiemann)	Date: 2008-02-17 01:45
Is this really necessary? I see that the sorting code gets a little simpler, but I believe that there are valid use cases for cmp out there, where using a key would at least be cumbersome. So why remove it when it's useful? For instance, if you have an algorithm to determine the order in which any two elements should occur (and for some reason the algorithm satisfies transitivity) then it's usable as the cmp function, but turning it into a key function may be complicated (and adversly affect performance); you might end up having to map each element to a number or tuple describing the ordering properties of the element, which can be non-trivial. Besides, it can also be non-obvious.
msg62477 - (view)	Author: Lea Wiemann (LeWiemann)	Date: 2008-02-17 01:49
"Non-obvious" to the novice that this technique can be used, and non- obvious to the reader of the program. I'm envisioning key functions that return long sequences of -1/0/1 integers, or numbers between 0 and 2**50... Not good. Instead of removing cmp, I'd suggest simply placing a note in the documentation saying that key is preferred over cmp, to encourage readability.
msg62480 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-02-17 05:27
FWIW, an object with a complex element-to-element comparison can define an __lt__() method and have it work with sort, min, max, etc.
msg62481 - (view)	Author: Lea Wiemann (LeWiemann)	Date: 2008-02-17 05:33
I know, but I don't always want to define the comparison on the object itself, because it's not an intrinsic feature of the object. It's just the way I happen to sort it right now. (That's especially likely if the ordering algorithm is complex.)
msg95973 - (view)	Author: Tom Switzer (tixxit)	Date: 2009-12-04 22:26
I am not sure I understand the reasoning behind removing the cmp parameter (and agree with Lea Wiemann). Trying to wedge a proper comparison into the key parameter is clumsy and unreadable (as can be seen in the 2to3 example above). The intrinsic ordering on objects does not necessarily match up with the way you want to sort them. For example, a natural intrinsic order on 2 points in 2d is lexicographical, however you often want to sort by angular order relative to some other point instead. Clearly this can never be put in __cmp__ or __lt__, because the sorted order is relative to some other unknown point. Trying to do this with the key function doesn't make sense; it would not be clear you are sorting by angular order and you'd have to instantiate a bunch of wrapper objects just to do basic sorting. Another quick example would be sorting hyperplanes by intersection on a ray. Sorting points along a direction given by a vector. I understand removing redundant features from a language, but I just can't see how key replaces this functionality in a readable or efficient way. This highlights an important class of cases (since it was mentioned that none could be thought of) in which we wish to make comparisons between values where a comparison (<, > or ==) is more numerically sound, more efficient, or the only option (perhaps the ordering is defined explicitly) then computing the exact values (eg. angle). As far as it seems, the only way to do this with key is by following the example given and creating a class solely to wrap each object that overrides __cmp__, which is certainly non-obvious (ie. there is no one, obvious way to do it).
msg95975 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-12-04 22:49
Can someone provide a code sample to make this argument more understandable for those of us who don't compare points by angular order for a living... :-) I'm not sure what the 2to3 example (I presume you mean msg59937) shows except that conversion from a cmp function to a key function may require you to actually think... Also, for all of you asking for cmp back, I hope you realize that sorting N values using a custom cmp function makes about N log N calls calls to cmp, whereas using a custom key calls the key function only N times. This means that even if your cmp function is faster than the best key function you can write, the advantage is lost as N increases (which is just where you'd like it to matter most :-).
msg95982 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2009-12-05 07:04
FWIW, we had a long discussion on comp.lang.python and the net result was that no use cases were found that required a cmp function. One complex case (sorting recursive tree structures) at first appeared to need a cmp-function but was found to be simpler and faster using a key-function. The net result of the conversation was the feeling that people who have grown-up using cmp-functions in either Python, C or some other language feel like they've lost something but really haven't. In contrast, people who use SQL or spreadsheet database tools find that key functions come naturally since neither supports cmp-functions, instead preferring the user to specify primary and secondary key functions. Also, it was pointed-out the removal of cmp-functions in sorted() and list.sort() was part of a larger effort to remove all forms of cmp from the whole language (i.e. the builtin cmp function is gone and so it the __cmp__ magic method). Rich comparisons have completely supplanted all uses of cmp-functions in the language as a whole -- having multiple ways to do it was confusing. In converting code from 2-to-3, we have found two sticky cases. The first occurs when an API had exposed cmp functions to the end-user (for example, unittest.getTestCaseNames() and unittest.makeSuite() have an optional sortUsing parameter that allows the user to specify a cmp-function). To support that use case (so that end-user API's would not have to be changed), we added a CmpToKey() tool which automatically converts cmp-functions to key functions. This tool is referenced in the docs and it could be added to the 2-to-3 converter. The second case occurs when a primary key is sorted ascending and a secondary key is sorted descending. The technique for that is to take advantage of sort stability and do two sorts: s.sort(key=secondary, reverse=True) s.sort(key=primary) # now sorted by primary ascending, secondary descending That technique is going to be documented in an update of the sorting how-to. It doesn't seem to arise much in practice and the cmp function equivalent seems to be harder for beginners to write (though at least it can be done with a single cmp-function and a single sort).
msg96024 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-12-06 11:57
Tom, I think I'm missing your point: all three of the examples you give seem like perfect candidates for a key-based sort rather than a comparison-based one. For the first example, couldn't you do something like: def direction(pt1, pt2): """angle of line segment from point 1 to point 2""" return atan2(pt2.y - pt1.y, pt2.x - pt1.x) my_points.sort(key=lambda pt: direction(reference_pt, pt)) ? How would having a cmp keyword argument make this any easier or simpler? Here's the best example I can think of for which key-based sorting is problematic: imagine that the Decimal type doesn't exist, and that you have triples (sign, coefficient_string, exponent) representing arbitrary-precision base 10 floating-point numbers. It's fairly tricky to come up with a key function that maps these triples to some existing ordered type, so that they can be sorted in natural numerical order. The problem lies in the way that the sort order for the coefficient string and exponent depends on the value of the sign (one way for positive numbers, reversed for negative numbers). But it's not a big deal to define a wrapper for cases like this.
msg96026 - (view)	Author: Tom Switzer (tixxit)	Date: 2009-12-06 16:24
Mark: I think your example actually helps illustrate my point. My point was that computing the angle directly is less efficient or not as nice numerically. For instance, if you are sorting points by angle relative to an extreme point you could do something like this (a first step in the Graham Scan) - be prepared for non-Python 3.0 code ;) from functools import partial from random import random def turn(p, q, r): """Return -1, 0, or 1 if p,q,r forms a right, straight, or left turn respectively.""" return cmp((q[0] - p[0])(r[1] - p[1]) - (r[0] - p[0])(q[1] - p[1]), 0) pts = [(random(), random()) for i in xrange(10)] i = min(xrange(len(pts)), key=lambda i: pts[i]) p = pts.pop(i) pts.sort(cmp=partial(turn, p)) Here our comparison function requires only 2 multiplications and 5 additions/subtractions. This function is nice especially if you are using arbitrary precision or rational numbers as it is exact. On Sun, Dec 6, 2009 at 6:57 AM, Mark Dickinson <report@bugs.python.org>wrote: > > Mark Dickinson <dickinsm@gmail.com> added the comment: > > Tom, I think I'm missing your point: all three of the examples you give > seem like perfect candidates for a key-based sort rather than a > comparison-based one. For the first example, couldn't you do something > like: > > def direction(pt1, pt2): > """angle of line segment from point 1 to point 2""" > return atan2(pt2.y - pt1.y, pt2.x - pt1.x) > > my_points.sort(key=lambda pt: direction(reference_pt, pt)) > > ? How would having a cmp keyword argument make this any easier or > simpler? > > > Here's the best example I can think of for which key-based sorting is > problematic: imagine that the Decimal type doesn't exist, and that you > have triples (sign, coefficient_string, exponent) representing > arbitrary-precision base 10 floating-point numbers. It's fairly tricky > to come up with a key function that maps these triples to some existing > ordered type, so that they can be sorted in natural numerical order. > The problem lies in the way that the sort order for the coefficient > string and exponent depends on the value of the sign (one way for > positive numbers, reversed for negative numbers). But it's not a big > deal to define a wrapper for cases like this. > > ---------- > nosy: +mark.dickinson > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue1771> > _______________________________________ >
msg96034 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2009-12-06 17:27
Ah. Thanks for the explanation; I see your point. I guess you do just have to use a CmpToKey-type wrapper for this sort of comparison. Though for this particular example, it seems to me that you could still use a key function lambda q: (q[0] - p[0])/(q[1]-p[1]), which would be even more efficient. This is assuming that your extreme point p has minimal second coordinate amongst points being sorted, which as I understand it is how the Graham Scan typically starts. There's the minor difficulty of dealing with points with the same second coordinate as p, where I guess the key value should be some form of +Infinity. I can also see that this might be problematic if you're working with a type that's exact for addition and multiplication but not for division (e.g., Decimal with unbounded precision). It would be interesting to see some relative timings for the sort stage of the Graham scan (on a million random points, say): key function versus cmp function.
msg96058 - (view)	Author: Tom Switzer (tixxit)	Date: 2009-12-07 15:46
If the equal min y-coords are handled, I think it'd be quicker too. As Guido noted, O(n) function calls is better then O(n log n) =] Though the general case is still unhandled. And, though it doesn't help my case, the Graham Scan can also be performed on points sorted lexicographically too, by constructing the upper & lower hull separately, hehe. Now, I understand cmp on the whole was removed from the language. Using __lt__, __eq__, etc. really is more natural. However, having an explicit cmp function for sorting makes sense to me. At the very least, it is more obvious and natural for some problems - though I respect that using a key func. is often faster. In some rare (though "rare" is very subjective) cases it is required; packing a cmp function into __cmp__ in a wrapper object is really just a hard-to-read cmp function and highlights the need for cmp. I would actually love to see it added for min/max too actually, since I find I often use a simple reduce function in place of a min(lst, cmp=...). Enforcing proper comparisons (<, >, ==, etc) makes sense, but would having the cmp function live, so to speak, in sorting really be that bad? Just inform the user in the docs that key is preferred and often faster.
msg96062 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-12-07 18:10
Python's sort implementation is carefully written to only use the "<" comparison, ever. So a cmp really isn't the most natural way to specify a comparison. (This should really be documented somewhere -- I know know it because Tim Peters & I shared an office while he did most of the work on Python's sort.)
msg102019 - (view)	Author: METAL XXX (metal)	Date: 2010-03-31 16:36
I have a tree: A / \ B C / \ D E which is implemented as a dict tree = { 'A': set(['B', 'C']), 'B': set(['D', 'E']), 'C': set(), 'D': set(), 'E': set(), } I want to sort the nodes. and I don't know how to write a key function for sort() in this situation so I write a cmp function: sorted(tree, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0) and it gets ['A', 'C', 'B', 'E', 'D']. how to convert cmp to key really confused me and it surely need more typing time. so I disagree the removal
msg102022 - (view)	Author: David Albert Torpey (dtorp)	Date: 2010-03-31 17:40
> sorted(tree, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0) > > and it gets ['A', 'C', 'B', 'E', 'D']. That cmp function is nonsense and isn't even close to being correct: >>> from random import shuffle >>> for i in range(10): ... t = list(tree) ... shuffle(t) ... print sorted(t, cmp=lambda x, y: 1 if x in tree[y] else -1 if y in tree[x] else 0) ['E', 'C', 'B', 'D', 'A'] ['A', 'D', 'C', 'B', 'E'] ['C', 'B', 'E', 'D', 'A'] ['E', 'D', 'A', 'C', 'B'] ['A', 'B', 'D', 'E', 'C'] ['D', 'A', 'E', 'C', 'B'] ['C', 'D', 'A', 'B', 'E'] ['A', 'C', 'B', 'D', 'E'] ['A', 'C', 'B', 'E', 'D'] ['A', 'C', 'B', 'D', 'E'] > how to convert cmp to key really confused > me and it surely need more typing time. Just cut and paste the recipe. Simple.
msg102030 - (view)	Author: METAL XXX (metal)	Date: 2010-03-31 19:14
Sorry I ripped the code from a mess and I forget the tree is "leaflized" as tree = { 'A': set(['B', 'C', 'D', 'E']), 'B': set(['D', 'E']), 'C': set(), 'D': set(), 'E': set(), } I don't want to talk about the actual problem. I think sometime it's hard to give an "absolute" weight value as key, for this example, is the relationship in graph. Then user have to "Copy and paste the recipe" to get a cmp function which should be already there. We are all adults here, why don't SIMPLELY tell the newbie don't use cmp() use key() unless you know what you are doing. Thanks for reply.
msg102031 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-03-31 20:57
cmp is gone. It's chances of coming back are close enough to zero that an assertAlmostEqual test will pass :). The rest of the discussion should move to one of the general python lists.
msg110969 - (view)	Author: METAL XXX (metal)	Date: 2010-07-20 20:34
Shame on me, after a long time I realized the problem referenced in my old post (http://bugs.python.org/msg102019) was actually topological sorting. It can't be done by Python's sort(), which doesn't support partial order. Trying to use cmp parameter is completely wrong. And cmp would mislead people like me to sort a partial order, evil! Now I'm absolutely agree with gone of cmp, viva Raymond Hettinger!

History
Date	User	Action	Args
2022-04-11 14:56:29	admin	set	github: 46104
2010-07-20 20:34:50	metal	set	messages: + msg110969
2010-03-31 20:57:43	r.david.murray	set	nosy: + r.david.murray messages: + msg102031
2010-03-31 19:14:28	metal	set	messages: + msg102030
2010-03-31 18:47:30	gvanrossum	set	nosy: - gvanrossum
2010-03-31 17:40:05	dtorp	set	nosy: + dtorp messages: + msg102022
2010-03-31 16:36:58	metal	set	nosy: + metal messages: + msg102019
2009-12-07 18:10:28	gvanrossum	set	messages: + msg96062
2009-12-07 18:06:08	gvanrossum	set	files: - unnamed
2009-12-07 18:06:04	gvanrossum	set	files: - unnamed
2009-12-07 15:46:20	tixxit	set	files: + unnamed messages: + msg96058
2009-12-06 17:27:30	mark.dickinson	set	messages: + msg96034
2009-12-06 16:24:41	tixxit	set	files: + unnamed messages: + msg96026
2009-12-06 11:57:29	mark.dickinson	set	nosy: + mark.dickinson messages: + msg96024
2009-12-05 07:04:48	rhettinger	set	messages: + msg95982
2009-12-04 22:49:38	gvanrossum	set	messages: + msg95975
2009-12-04 22:26:56	tixxit	set	nosy: + tixxit messages: + msg95973
2008-02-17 05:33:26	LeWiemann	set	messages: + msg62481
2008-02-17 05:27:28	rhettinger	set	messages: + msg62480
2008-02-17 01:49:36	LeWiemann	set	messages: + msg62477
2008-02-17 01:45:56	LeWiemann	set	nosy: + LeWiemann messages: + msg62476
2008-01-30 20:16:11	rhettinger	set	status: open -> closed
2008-01-30 20:15:44	rhettinger	set	messages: + msg61877
2008-01-14 23:35:45	gvanrossum	set	messages: + msg59939
2008-01-14 23:24:37	rhettinger	set	messages: + msg59937
2008-01-14 02:33:54	gvanrossum	set	messages: + msg59885
2008-01-14 00:22:26	rhettinger	set	files: + nocmp.diff messages: + msg59877
2008-01-13 22:30:36	rhettinger	set	assignee: rhettinger resolution: accepted messages: + msg59867
2008-01-09 00:42:20	rhettinger	set	messages: + msg59579
2008-01-09 00:37:52	rhettinger	set	nosy: + rhettinger messages: + msg59578
2008-01-09 00:02:16	gvanrossum	set	components: + Interpreter Core
2008-01-09 00:02:05	gvanrossum	create