This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author wbolster
Recipients wbolster
Date 2013-09-07.18:26:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1378578388.4.0.196112791647.issue18962@psf.upfronthosting.co.za>
In-reply-to
Content
The heapq.merge() function merges multiple sorted iterables into a single sorted output.  The function uses a heap queue that is repeatedly looped over until it has generated all output.

If only a single iterable is passed to heapq.merge(), the heap will have len(h) == 1, which means the looping and heap manipulation just degrades to a convoluted "yield from iterables[0]". This negatively impacts performance. The attached patch adds a short-circuit for this single input case.

The behaviour of the function remains unchanged:

>>> list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
>>> list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]

For multiple inputs, there is no measurable performance difference (measured
using IPython's %timeit). Without patch:

>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.7 µs per loop
>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.4 µs per loop
>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.5 µs per loop

With patch:

>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.7 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.8 µs per loop

The real performance gain is of course when only a single iterable is passed. Without patch:

>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.65 ms per loop
>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.52 ms per loop
>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.51 ms per loop

With patch:

>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 604 µs per loop
>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 609 µs per loop
>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 603 µs per loop

This is a ~4x speedup for an input consisting of 10000 items. Compared to plain iteration:

>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 263 µs per loop
>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 268 µs per loop
>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 273 µs per loop

Without the patch, heapq.merge() is ~10x as slow as plain iteration. With the patch, this is reduced to ~2.5x.

(Note: all testing done using Python 3.3.2 on a 64bit Linux machine.)
History
Date User Action Args
2013-09-07 18:26:28wbolstersetrecipients: + wbolster
2013-09-07 18:26:28wbolstersetmessageid: <1378578388.4.0.196112791647.issue18962@psf.upfronthosting.co.za>
2013-09-07 18:26:28wbolsterlinkissue18962 messages
2013-09-07 18:26:27wbolstercreate