Message376178
If we were to do this, I think a better API might be to accept an arbitrary iterable, then produce a sorted iterable:
def sorted_on_disk(iterable, key=None, reverse=False) -> Iterable:
...
It would sort chunks of the input and store them in files as sequences of pickles, merging them as they got bigger, and then return an iterator over the resulting single sorted file.
This would be more composable with other standard Python functions and would be a good way of separating concerns. sorted(...) and heapq.merge(...) already have the correct APIs to do it this way.
Potential implementation detail: For some small fixed n, always doing a 2^n-way heapq.merge instead of a bunch of 2-way merges would do fewer passes over the data and would allow the keys to be computed 1/n as many times, assuming we wouldn't decorate-sort-undecorate. |
|
Date |
User |
Action |
Args |
2020-09-01 03:22:16 | Dennis Sweeney | set | recipients:
+ Dennis Sweeney, pablogsal, platon.work |
2020-09-01 03:22:16 | Dennis Sweeney | set | messageid: <1598930536.92.0.655062185265.issue41678@roundup.psfhosted.org> |
2020-09-01 03:22:16 | Dennis Sweeney | link | issue41678 messages |
2020-09-01 03:22:16 | Dennis Sweeney | create | |
|