This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add a key parameter (like sorted) to heapq.merge
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Tommy.Carstensen, berker.peksag, eric.araujo, eric.snow, giampaolo.rodola, mark.dickinson, python-dev, rhettinger, serhiy.storchaka, ssapin, stutzbach, terry.reedy
Priority: normal Keywords: needs review, patch

Created on 2012-01-09 10:35 by ssapin, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
heapq_merge_key.patch ssapin, 2012-01-09 10:35 'hg diff' output against rev ca2a35140e6a review
benchmark_heapq_merge.py ssapin, 2012-01-09 10:43 benchmark several implementations
heapq_merge_key_duplicate.patch ssapin, 2012-01-16 14:58 'hg diff' output against rev ca2a35140e6a review
heap.diff rhettinger, 2013-03-26 06:54 Rough draft (untested) for a Heap() class review
heap2.diff rhettinger, 2013-04-29 11:43 Update the draft Heap() class
keymerge.diff rhettinger, 2014-05-27 01:31 Draft patch without doc updates review
keymerge2.diff rhettinger, 2014-05-29 08:42 Add both key=func and reverse=True options review
Messages (21)
msg150927 - (view) Author: Simon Sapin (ssapin) Date: 2012-01-09 10:35
Hi,

The attached patch adds a 'key' optional parameter to the heapq.merge function that behaves as in sorted().

Related discussion: http://mail.python.org/pipermail/python-ideas/2012-January/013295.html

This is my first contribution to CPython.
msg150928 - (view) Author: Simon Sapin (ssapin) Date: 2012-01-09 10:43
The attached script benchmarks the basline (current implementation) against 3 new implementations, as suggested on http://mail.python.org/pipermail/python-ideas/2012-January/013296.html

On my machine, the output is:

    merge_baseline
    per run, min of 3 = 7.527 ms
    
    merge_1
    per run, min of 3 = 9.894 ms
    131.449 % of baseline
    
    merge_2
    per run, min of 3 = 7.948 ms
    105.594 % of baseline
    
    merge_3
    per run, min of 3 = 7.581 ms
    100.716 % of baseline

On this particular input, merge_2 adds 6% of overhead when the key parameter is not used. While merge_3 only adds 1% of overhead, it almost doubles the amount of code. (Which was admittedly not that long to begin with.)

The patch in the previous message is with the merge_2 implementation, which seemed like the best compromise to me.
msg150931 - (view) Author: Simon Sapin (ssapin) Date: 2012-01-09 11:10
Oops, the patch to the documentation would also need 'New in 3.3: the key parameter', with the right Sphinx directive. But that depends on whether this change ends up in 3.3 or 3.4.

Does 3.3 still get new features?
msg150954 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-01-09 16:53
Yes, 3.3 is still in the early development stage, and new features will be accepted until the first beta (in June, see PEP 398).  “.. versionadded:: 3.3 The *key* parameter” will do.
msg150969 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-01-09 19:33
Simon, please keep the original version fast by creating two code paths:

   if key is None:
      original_code
   else:
      new_code using the key_function
msg150983 - (view) Author: Simon Sapin (ssapin) Date: 2012-01-09 22:13
Raymond, please have a look at merge_3 in benchmark_heapq_merge.py. It is implemented as you say.

Do you think the speed is worth the code duplication?
msg151369 - (view) Author: Simon Sapin (ssapin) Date: 2012-01-16 14:58
heapq_merge_key_duplicate.patch is a new patch with two code path. It also updates the function’s docstring (which the previous patch did not). Raymond, do you think the speed is worth the DRY violation?
msg152802 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-02-07 04:11
I'll look at this in the next couple of weeks.  Hang tight :-)
msg152984 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-02-09 20:39
FWIW, Guido approves of the idea, msg152969 in #4356
msg168070 - (view) Author: Simon Sapin (ssapin) Date: 2012-08-13 07:51
I just remembered about this. I suppose it is too late for 3.3?
msg168116 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-08-13 15:20
Yes, 3.3 is already in beta.
msg185259 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-03-26 06:54
Attaching a rough draft implementation for a fully encapsulated Heap() class that is thread-safe, supports minheaps and maxheaps, and efficiently implements key-functions (called no more than once per key).
msg188066 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-04-29 12:14
heap2.diff contains only a single line's change.  Wrong file attached?
msg188067 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-04-29 12:14
Ah, I see the new file now (I'd failed to refresh my browser);  sorry for the noise.
msg188080 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-04-29 17:51
Looks pretty good to me.

- There's a bonus print call in the diff.

- Should the "len(self._data)" call be protected by the lock?  I can't immediately think of any reason why that would be necessary (e.g., pushpop nd poppush never change the size of self._data, so there's no risk of getting a bogus length there), but the lack of the lock makes me nervous.

- Support for iter() seems a bit out of place to me.  What are the use-cases for this?  Would it make sense to leave this out (for now)?
msg188085 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-29 18:55
There is already one heap class in the stdlib: queue.PriorityQueue. Why create a duplicate instead extend queue.PriorityQueue with desired features?

May be name the maxheap parameter as reverse?
msg219380 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-05-30 09:28
New changeset f5521f5dec4a by Raymond Hettinger in branch 'default':
Issue #13742:  Add key and reverse parameters to heapq.merge()
http://hg.python.org/cpython/rev/f5521f5dec4a
msg233792 - (view) Author: Tommy Carstensen (Tommy.Carstensen) Date: 2015-01-10 01:19
I noticed 3.5 alpha1 is not released until February 1st. Is there any way I can get my hands on this new functionality?
msg233793 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-01-10 01:24
Hi Tommy, the patch is already committed to Python 3.5. See https://docs.python.org/3.5/library/heapq.html#heapq.merge
msg233795 - (view) Author: Tommy Carstensen (Tommy.Carstensen) Date: 2015-01-10 01:55
Yes, but 3.5 has not been pre-released yet.
msg233804 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-01-10 08:47
You can set up mecurial on your machine, make a read-only clone of the cpython repository, and compile it just as do other people, whether core-developers or otherwise.  See docs.python.org/devguide for details.
History
Date User Action Args
2022-04-11 14:57:25adminsetgithub: 57951
2015-01-10 08:47:47terry.reedysetmessages: + msg233804
2015-01-10 01:55:18Tommy.Carstensensetmessages: + msg233795
2015-01-10 01:24:07berker.peksagsetnosy: + berker.peksag

messages: + msg233793
stage: patch review -> resolved
2015-01-10 01:19:39Tommy.Carstensensetnosy: + Tommy.Carstensen
messages: + msg233792
2014-05-30 09:29:28rhettingersetstatus: open -> closed
resolution: fixed
2014-05-30 09:28:45python-devsetnosy: + python-dev
messages: + msg219380
2014-05-29 08:42:18rhettingersetfiles: + keymerge2.diff
2014-05-29 08:42:00rhettingersetfiles: - keymerge2.diff
2014-05-29 08:32:46rhettingersetfiles: + keymerge2.diff
2014-05-27 01:31:45rhettingersetfiles: + keymerge.diff
2014-05-27 01:30:59rhettingersetversions: + Python 3.5, - Python 3.4
2014-05-12 17:57:43eric.snowsetnosy: + eric.snow
2013-04-29 18:55:40serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg188085
2013-04-29 17:51:35mark.dickinsonsetmessages: + msg188080
2013-04-29 12:14:41mark.dickinsonsetmessages: + msg188067
2013-04-29 12:14:03mark.dickinsonsetnosy: + mark.dickinson
messages: + msg188066
2013-04-29 11:43:52rhettingersetfiles: - heap2.diff
2013-04-29 11:43:35rhettingersetfiles: + heap2.diff
2013-04-29 10:54:52rhettingersetpriority: low -> normal
files: + heap2.diff
2013-03-26 06:54:31rhettingersetfiles: + heap.diff

messages: + msg185259
2012-08-13 15:20:10eric.araujosetkeywords: + needs review

stage: patch review
messages: + msg168116
versions: + Python 3.4, - Python 3.3
2012-08-13 07:51:59ssapinsetmessages: + msg168070
2012-02-09 20:39:38terry.reedysetnosy: + terry.reedy
messages: + msg152984
2012-02-07 08:44:03giampaolo.rodolasetnosy: + giampaolo.rodola
2012-02-07 04:11:24rhettingersetmessages: + msg152802
2012-01-16 14:58:54ssapinsetfiles: + heapq_merge_key_duplicate.patch

messages: + msg151369
2012-01-09 22:13:45ssapinsetmessages: + msg150983
2012-01-09 19:33:36rhettingersetpriority: normal -> low

messages: + msg150969
2012-01-09 18:44:52stutzbachsetnosy: + stutzbach
2012-01-09 16:53:30eric.araujosetnosy: + eric.araujo

messages: + msg150954
versions: - Python 3.4
2012-01-09 11:10:14ssapinsetmessages: + msg150931
2012-01-09 10:51:00eric.smithsetassignee: rhettinger
2012-01-09 10:43:36ssapinsetfiles: + benchmark_heapq_merge.py

messages: + msg150928
2012-01-09 10:35:41ssapincreate