This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Adapt heapq push/pop/replace to allow passing a comparator.
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: BreamoreBoy, Prashant.Sharma, gdr@garethrees.org, mark.dickinson, mrolle, rhettinger
Priority: low Keywords: patch

Created on 2014-03-13 08:18 by Prashant.Sharma, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
heapq_com.patch Prashant.Sharma, 2014-03-13 08:18 review
Messages (10)
msg213361 - (view) Author: Prashant Sharma (Prashant.Sharma) * Date: 2014-03-13 08:18
It would be more convenient to extend heapq to support user defined comparators. 

Default can be cmp_lt in which case they behave as they do now. 

Caveat:
What happens if uses switches comparator between calls to push or pop. The expected behavior can be unpredictable and should be obvious to the user of the API. It might also be good to state this obvious, if people here agree. 

P.S. I am really new to python world, forgive me for any kind of stupidity. Criticism and comments is what I am looking for here !
msg213363 - (view) Author: Prashant Sharma (Prashant.Sharma) * Date: 2014-03-13 08:25
Signed the PSF CLA.
msg213365 - (view) Author: Gareth Rees (gdr@garethrees.org) * (Python triager) Date: 2014-03-13 08:59
It would be better to accept a key function instead of a comparison
function (cf. heapq.nlargest and heapq.nsmallest).

But note that this has been proposed before and rejected: see
issue1904 where Raymond Hettinger provides this rationale:

    Use cases aside, there is another design issue in that the
    key-function approach doesn't work well with the heap functions on
    regular lists. Successive calls to heap functions will of
    necessity call the key- function multiple times for any given
    element. This contrasts with sort () where the whole purpose of
    the key function was to encapsulate the decorate-sort-undecorate
    pattern which was desirable because the key- function called
    exactly once per element.

However, in the case of the bisect module (where requests for a key
function are also common), Guido was recently persuaded that there was
a valid use case. See issue4356, and this thread on the Python-ideas
mailing list:
<https://mail.python.org/pipermail/python-ideas/2012-February/thread.html#13650>
where Arnaud Delobelle points out that:

    Also, in Python 3 one can't assume that values will be comparable so
    the (key, value) tuple trick won't work: comparing the tuples may well
    throw a TypeError.

and Guido responds:

    Bingo. That clinches it. We need to add key=.
msg213373 - (view) Author: Prashant Sharma (Prashant.Sharma) * Date: 2014-03-13 09:45
Hey Gareth,

"add Key=" approach also solves the purpose. The purpose is to be able to use heapq as max heap too conveniently. (I just wish python had minmaxheap too, but felt that is too much to ask for.) It is a very common usage and usual tricks of inverting the values( or negating) isn't great for all purposes. 

I am happy to redo the patch, if that is an acceptable solution. 

On a side note, did we also discuss about making some public functions for using heapq as max heap. (There are a few internal functions for the same).
msg213437 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-03-13 18:36
See also #13742.
msg213496 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-03-14 00:02
Thanks for submitting a patch.  

I'm sorry, but I don't think this is the right approach.  I will likely keep the current functions as they are now.  Under no circumstances do I want to add any overhead to the existing functions (they serve performance critical roles in high performance async tools such as Tornado).

Instead, I'm considering alternatives such as a second set of functions that have a key-function.

The existing cmp_lt function was a hack that needs to go away and never return.  It was put in to accommodate some bad code in Twisted that against recommendations relied on a specific rich comparison operator other that __lt__.  Because of that, PEP 8 amended to say that we recommend that all six rich comparison operators be implemented for comparison (and the functools.total_ordering class decorator was added in furtherance of that end).
msg213504 - (view) Author: Prashant Sharma (Prashant.Sharma) * Date: 2014-03-14 01:18
Did not knew about #13742. I hope it gets merged soon and may be, if possible backport too ?
msg213505 - (view) Author: Prashant Sharma (Prashant.Sharma) * Date: 2014-03-14 01:21
Thanks Raymond for looking at the patch, understood your considerations are reasonable. This discussion can be closed here.
msg358804 - (view) Author: Michael Rolle (mrolle) Date: 2019-12-23 07:41
I realize this request is closed, but I hope people will still be reading this comment, and can perhaps either reopen it or submit a new request.  I don't know how to submit a new request myself.
...

I'd like to see (key=) capability somehow, without degrading performance of time-critical applications.
I've got some ideas for your consideration, if you please.

1. Make the heapq into a heap class.  It would be more natural (for me, anyway) to push/pop using the heap's methods, rather than calling a function in the heap package and having to specify the heap list as well.

A heap class in C could outperform current heapq with lists by storing the member objects in an internal array.

This would also solve the issue of having the user switch comparison methods in the middle of things.  The comparison method would be specified when the heap is constructed.  It could be changed later, at the expense of resorting the heap.  I suggest the comparison method parameters be the same as for .sort () and sorted ().

By default, the heap class would directly compare elements using __lt__, and so the performance would be as good, or slightly better, than the current heapq package functions.

2. In my own use case, I have to define __lt__ for the objects in my heapq by comparing the _keys_ of the two items.  So push/pop wind up calling the key function many times anyway.  The heap push/pop themselves could do this same work probably a bit faster in C code than would calling my own __lt__ method.

3. Performance could be improved when there is a key function by having the heap store both the heap elements and their keys.  I believe a C implementation could benefit by storing key and value objects next to each other in the internal array.

4. I'd be happy to submit a reference implementation, only I don't have time to do that right now, sorry.  I'm sure someone else would be up to the challenge.
msg358817 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-12-23 14:47
Michael: if you want to take this further, your best bet would probably be to start a discussion on the python-ideas mailing list (https://mail.python.org/mailman3/lists/python-ideas.python.org/).
History
Date User Action Args
2022-04-11 14:57:59adminsetgithub: 65104
2019-12-23 14:47:09mark.dickinsonsetnosy: + mark.dickinson
messages: + msg358817
2019-12-23 07:41:07mrollesetnosy: + mrolle
messages: + msg358804
2014-03-14 08:06:43rhettingersetstatus: open -> closed
resolution: rejected
2014-03-14 01:21:08Prashant.Sharmasetmessages: + msg213505
2014-03-14 01:18:22Prashant.Sharmasetmessages: + msg213504
2014-03-14 00:02:56rhettingersetpriority: normal -> low

type: enhancement
assignee: rhettinger
versions: + Python 3.5, - Python 2.7, Python 3.4
nosy: + rhettinger

messages: + msg213496
2014-03-13 18:36:05BreamoreBoysetnosy: + BreamoreBoy
messages: + msg213437
2014-03-13 09:45:48Prashant.Sharmasetmessages: + msg213373
2014-03-13 08:59:55gdr@garethrees.orgsetnosy: + gdr@garethrees.org
messages: + msg213365
2014-03-13 08:25:31Prashant.Sharmasetmessages: + msg213363
2014-03-13 08:18:17Prashant.Sharmacreate