This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Dennis Sweeney
Recipients Dennis Sweeney, pablogsal, platon.work
Date 2020-09-01.03:22:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1598930536.92.0.655062185265.issue41678@roundup.psfhosted.org>
In-reply-to
Content
If we were to do this, I think a better API might be to accept an arbitrary iterable, then produce a sorted iterable:

def sorted_on_disk(iterable, key=None, reverse=False) -> Iterable:
    ...

It would sort chunks of the input and store them in files as sequences of pickles, merging them as they got bigger, and then return an iterator over the resulting single sorted file.

This would be more composable with other standard Python functions and would be a good way of separating concerns. sorted(...) and heapq.merge(...) already have the correct APIs to do it this way.

Potential implementation detail: For some small fixed n, always doing a 2^n-way heapq.merge instead of a bunch of 2-way merges would do fewer passes over the data and would allow the keys to be computed 1/n as many times, assuming we wouldn't decorate-sort-undecorate.
History
Date User Action Args
2020-09-01 03:22:16Dennis Sweeneysetrecipients: + Dennis Sweeney, pablogsal, platon.work
2020-09-01 03:22:16Dennis Sweeneysetmessageid: <1598930536.92.0.655062185265.issue41678@roundup.psfhosted.org>
2020-09-01 03:22:16Dennis Sweeneylinkissue41678 messages
2020-09-01 03:22:16Dennis Sweeneycreate