This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add special case for single iterator in heapq.merge function
Type: performance Stage:
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: python-dev, rhettinger, wbolster
Priority: normal Keywords: patch

Created on 2013-09-07 18:26 by wbolster, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
heapq-merge-single-input-optimization.patch wbolster, 2013-09-07 18:26 review
merge2.diff rhettinger, 2013-09-08 18:55 Use yield-from when only one iterator remains review
merge3.diff wbolster, 2013-09-10 18:29 review
Messages (8)
msg197174 - (view) Author: wouter bolsterlee (wbolster) * Date: 2013-09-07 18:26
The heapq.merge() function merges multiple sorted iterables into a single sorted output.  The function uses a heap queue that is repeatedly looped over until it has generated all output.

If only a single iterable is passed to heapq.merge(), the heap will have len(h) == 1, which means the looping and heap manipulation just degrades to a convoluted "yield from iterables[0]". This negatively impacts performance. The attached patch adds a short-circuit for this single input case.

The behaviour of the function remains unchanged:

>>> list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
>>> list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]

For multiple inputs, there is no measurable performance difference (measured
using IPython's %timeit). Without patch:

>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.7 µs per loop
>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.4 µs per loop
>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.5 µs per loop

With patch:

>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.7 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.8 µs per loop

The real performance gain is of course when only a single iterable is passed. Without patch:

>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.65 ms per loop
>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.52 ms per loop
>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.51 ms per loop

With patch:

>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 604 µs per loop
>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 609 µs per loop
>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 603 µs per loop

This is a ~4x speedup for an input consisting of 10000 items. Compared to plain iteration:

>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 263 µs per loop
>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 268 µs per loop
>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 273 µs per loop

Without the patch, heapq.merge() is ~10x as slow as plain iteration. With the patch, this is reduced to ~2.5x.

(Note: all testing done using Python 3.3.2 on a 64bit Linux machine.)
msg197224 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-09-08 04:58
At first glance, this looks reasonable.  I'll give it a more thorough look shortly.
msg197256 - (view) Author: wouter bolsterlee (wbolster) * Date: 2013-09-08 12:14
An additional speedup would be to add a "if len(h) == 1" check inside the while loop, and just yield from the remaining iterator if a single iterable remains. This would also speed up merges with multiple inputs, as it doesn't do the whole heapreplace() loop for the last remaining iterable. Example: merge([], [], [], range(100000). This would involve some more refactoring inside the function though, since the current implementation only stores the .next() function for each iterable.
msg197272 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-09-08 15:45
Try this patch.
msg197460 - (view) Author: wouter bolsterlee (wbolster) * Date: 2013-09-10 18:29
Thanks Raymond, that is exactly what I had in mind (see my previous comment).
Here's a slightly cleaned up version of the patch (stylistic/PEP8 cleanups),
with some benchmarks included below.

In case the two longest iterators have about the same size, no performance
difference can be measured:

$ ./python -m timeit -s 'from heapq import merge' 'for x in merge([], [], [1], range(100), range(100)): pass'

Without patch:

10000 loops, best of 3: 71.2 usec per loop
10000 loops, best of 3: 71.9 usec per loop
10000 loops, best of 3: 71.7 usec per loop

With patch:

10000 loops, best of 3: 71.4 usec per loop
10000 loops, best of 3: 76.7 usec per loop
10000 loops, best of 3: 72.1 usec per loop

As expected, the performance gain is very significant in case one of the
iterators is much longer than the others:

$ python -m timeit -n 100 -s 'from heapq import merge' 'for x in merge([], [], [1], range(100), range(100000)): pass'

Without path:

100 loops, best of 3: 27.8 msec per loop
100 loops, best of 3: 26.9 msec per loop
100 loops, best of 3: 27.7 msec per loop

With patch:

100 loops, best of 3: 6.26 msec per loop
100 loops, best of 3: 6.28 msec per loop
100 loops, best of 3: 6.03 msec per loop
msg197470 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-09-11 06:16
New changeset 0e70bf1f32a3 by Raymond Hettinger in branch 'default':
Issue #18962:  Optimize the single iterator case for  heapq.merge()
http://hg.python.org/cpython/rev/0e70bf1f32a3
msg197472 - (view) Author: wouter bolsterlee (wbolster) * Date: 2013-09-11 06:50
Thanks for the quick response. 

Btw, do I understand correctly code cleanups are not welcome, even when touching the code anyway?
msg197473 - (view) Author: wouter bolsterlee (wbolster) * Date: 2013-09-11 06:51
(In case you missed it: my latest comment included a cleaned up version of an earlier patch.)
History
Date User Action Args
2022-04-11 14:57:50adminsetgithub: 63162
2013-09-11 06:51:56wbolstersetmessages: + msg197473
2013-09-11 06:50:53wbolstersetmessages: + msg197472
2013-09-11 06:17:32rhettingersetstatus: open -> closed
resolution: fixed
2013-09-11 06:16:00python-devsetnosy: + python-dev
messages: + msg197470
2013-09-10 18:29:14wbolstersetfiles: + merge3.diff

messages: + msg197460
2013-09-08 18:55:47rhettingersetfiles: - merge.diff
2013-09-08 18:55:32rhettingersetfiles: + merge2.diff
2013-09-08 15:45:24rhettingersetfiles: + merge.diff

messages: + msg197272
2013-09-08 12:14:44wbolstersetmessages: + msg197256
2013-09-08 04:58:56rhettingersetassignee: rhettinger
messages: + msg197224
versions: + Python 3.4, - Python 3.3
2013-09-08 04:56:15benjamin.petersonsetnosy: + rhettinger
2013-09-07 18:26:28wbolstercreate