Title: Add a function for merging sorted iterables
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: remi.lapeyre, rhettinger, serhiy.storchaka, tim.peters
Priority: normal Keywords:

Created on 2020-04-09 14:42 by serhiy.storchaka, last changed 2020-04-09 16:59 by serhiy.storchaka. This issue is now closed.

Messages (4)
msg366056 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-04-09 14:42
It would be useful to have a function in itertools to merge sorted iterables.

    merge_sorted(*iterables, key=None, reverse=False):

It should emit the same items as sorted(tee(*iterables), key=key, reverse=reverse) if iterables are sorted with key and reverse. But it should use the amount of memory O(M) where M is the number of iterables, and support infinite iterables.
msg366058 - (view) Author: RĂ©mi Lapeyre (remi.lapeyre) * Date: 2020-04-09 14:56
Hi Serhiy, do you plan on writing a PR for this feature?

If not I would love to have a go at it.
msg366060 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-04-09 15:03
Serhiy, are you aware of heapq.merge()?  If not, look it up.  And then if you still think merge_sorted() would differ in some way, please spell out how it would differ.

Based on what you wrote, you threw an invalid invocation of tee() into the mix for some reason, which heapq.merge() certainly doesn't do.

And it's impossible to keep a small bound on memory use when reverse=True if you want it return the same sequence as sorted(... reverse=True).  Instead, for heapq.merge().

reverse is a boolean value. If set to True, then the input elements are merged as if each comparison were reversed. To achieve behavior similar to sorted(itertools.chain(*iterables), reverse=True), all iterables must be sorted from largest to smallest.
msg366081 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-04-09 16:59
Thank you Tim, it is exactly what I need!

I got this search result, but I rejected it because it looked obvious that the function from the heapq module cannot have any relation to this. :(

I meant chain() instead of tee(), sorry.
Date User Action Args
2020-04-09 16:59:00serhiy.storchakasetstatus: open -> closed
resolution: out of date
messages: + msg366081

stage: resolved
2020-04-09 15:03:17tim.peterssetmessages: + msg366060
2020-04-09 14:56:23remi.lapeyresetnosy: + remi.lapeyre
messages: + msg366058
2020-04-09 14:42:29serhiy.storchakacreate