Author Dennis Sweeney
Recipients Dennis Sweeney, bbayles, rhettinger, serhiy.storchaka, tim.peters
Date 2020-05-17.12:15:44
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1589717745.08.0.737191396269.issue38938@roundup.psfhosted.org>
In-reply-to
Content
It seems to me that the code sprawl mostly comes from the separate handling of the four keyed/unkeyed and forward/reverse cases, which as far as I can tell requires a branch in the innermost loop if not unrolled into separate cases. I think recursive_merge.py is about as concise as possible while still efficiently handling all four cases.

Is there any issue with recursion in this application? Even if there are 2**64-1 iterables, this would only mean 64 nested next() calls, which should be within the stack on most machines, no?

I did the following benchmark:

py -3.9 -m pyperf timeit -s "from [FILENAME] import merge; from collections import deque" "deque(merge(open('english.txt'), open('dutch.txt'), open('french.txt'), open('german.txt'), open('italian.txt')), maxlen=0)"

    new_merge.py:          Mean +- std dev: 773 ms +- 16 ms
    tournament_heap.py:    Mean +- std dev: 665 ms +- 42 ms
    losers.py:             Mean +- std dev: 470 ms +- 31 ms
    Existing heapq:        Mean +- std dev: 313 ms +- 2 ms
    recursive_merge.py:    Mean +- std dev: 266 ms +- 3 ms

I can look at some more diverse benchmarks (types, iterable length imbalance, lengths of an iterator's win-streak, etc.) soon.
History
Date User Action Args
2020-05-17 12:15:45Dennis Sweeneysetrecipients: + Dennis Sweeney, tim.peters, rhettinger, serhiy.storchaka, bbayles
2020-05-17 12:15:45Dennis Sweeneysetmessageid: <1589717745.08.0.737191396269.issue38938@roundup.psfhosted.org>
2020-05-17 12:15:45Dennis Sweeneylinkissue38938 messages
2020-05-17 12:15:44Dennis Sweeneycreate