classification
Title: Simplify documentation of itertools.zip_longest
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: docs@python, r.david.murray, rami, rhettinger
Priority: normal Keywords:

Created on 2017-08-24 16:03 by rami, last changed 2017-09-07 21:51 by rhettinger. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3200 closed rami, 2017-08-24 16:06
PR 3427 merged rhettinger, 2017-09-07 19:46
Messages (5)
msg300788 - (view) Author: Raphael Michel (rami) * Date: 2017-08-24 16:03
The documentation given for itertools.zip_longest contains a "roughly equivalent" pure-python implementation of the function that is intended to help the user understand what zip_longest does on a functional level.

However, the given implementation is very complicated to read for newcomers and experienced Python programmers alike, as it uses a custom-defined exception for control flow handling, a nested function, a condition that always is true if any arguments are passed ("while iterators"), as well as two other non-trivial functions from itertools (chain and repeat).

For future reference, this is the currently given implementation:

    def zip_longest(*args, **kwds):
        # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
        fillvalue = kwds.get('fillvalue')
        iterators = [iter(it) for it in args]

        while True:
            exhausted = 0
            values = []

            for it in iterators:
                try:
                    values.append(next(it))
                except StopIteration:
                    values.append(fillvalue)
                    exhausted += 1

            if exhausted < len(args):
                yield tuple(values)
            else:
                break

This is way more complex than necessary to teach the concept of zip_longest. With this issue, I will submit a pull request with a new example implementation that seems to be the same level of "roughly equivalent" but is much easier to read, since it only uses two loops and now complicated flow 

    def zip_longest(*args, **kwds):
        # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
        fillvalue = kwds.get('fillvalue')
        iterators = [iter(it) for it in args]

        while True:
            exhausted = 0
            values = []

            for it in iterators:
                try:
                    values.append(next(it))
                except StopIteration:
                    values.append(fillvalue)
                    exhausted += 1

            if exhausted < len(args):
                yield tuple(values)
            else:
                break


Looking at the C code of the actual implementation, I don't see that any one of the two implementations is obviously "more equivalent". I'm unsure about performance -- I haven't tried them on that but I don't think that's the point of this learning implementation.

I ran all tests from Lib/test/test_itertools.py against both the old and the new implementation. The new implementation fails at 3 tests, while the old implementation failed at four. Two of the remaining failures are related to TypeErrors not being thrown on invalid input, one of them is related to pickling the resulting object. I believe all three of them are fine to ignore in this sample, as it is not relevant to the documentation purpose.

Therefore, I believe the documentation should be changed like suggested. I'd be happy for any feedback or further ideas to improve its readability!
msg300790 - (view) Author: Raphael Michel (rami) * Date: 2017-08-24 16:30
I just noticed that in my post I accidentally pasted MY implementation twice instead of the old one, sorry for that. Here's the old one for quick comparison:

class ZipExhausted(Exception):
    pass

def zip_longest(*args, **kwds):
    # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    counter = len(args) - 1
    def sentinel():
        nonlocal counter
        if not counter:
            raise ZipExhausted
        counter -= 1
        yield fillvalue
    fillers = repeat(fillvalue)
    iterators = [chain(it, sentinel(), fillers) for it in args]
    try:
        while iterators:
            yield tuple(map(next, iterators))
    except ZipExhausted:
        pass
msg300799 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-08-24 19:32
Thanks for wanting to improve the documentation.

Raymond will address this definitively, but unless I'm mistaken part of the purpose of the examples is to show how the various itertools can be used.  If that is true, then in the context of the overall itertools documentation I think the current example has more teaching value than your suggested revision.
msg300830 - (view) Author: Raphael Michel (rami) * Date: 2017-08-25 09:33
Well, I could think of a way to still use repeat() here that also is pretty clean except for the fact that it fails if all inputs to zip_longest are repeat() iterators themselves (which would here lead to an empty iterator while it would otherwise lead to an infinite one):

    def zip_longest(*args, **kwds):
        # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
        fillvalue = kwds.get('fillvalue')
        iterators = [iter(it) for it in args]

        while True:
            values = []

            for i, it in enumerate(iterators):
                try:
                    values.append(next(it))
                except StopIteration:
                    values.append(fillvalue)
                    iterators[i] = repeat(fillvalue)

            if all(isinstance(it, repeat) for it in iterators):
                break
            else:
                yield tuple(values)

Keeping chain() in use here just for the sake of using it is not worth it, I believe.
msg301635 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-09-07 21:51
New changeset 3147b0422cbeb98065666ccf95ab6845ac800fd4 by Raymond Hettinger in branch 'master':
bpo-31270: Modification of Pr 3200 (#3427)
https://github.com/python/cpython/commit/3147b0422cbeb98065666ccf95ab6845ac800fd4
History
Date User Action Args
2017-09-07 21:51:11rhettingersetmessages: + msg301635
2017-09-07 21:03:37rhettingersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-09-07 19:46:43rhettingersetpull_requests: + pull_request3424
2017-08-25 09:33:42ramisetmessages: + msg300830
2017-08-24 19:32:34r.david.murraysetnosy: + r.david.murray
messages: + msg300799
2017-08-24 18:01:39rhettingersetassignee: docs@python -> rhettinger

nosy: + rhettinger
2017-08-24 16:30:23ramisetmessages: + msg300790
2017-08-24 16:19:31Mariattasetstage: patch review
versions: + Python 3.6
2017-08-24 16:06:32ramisetpull_requests: + pull_request3239
2017-08-24 16:03:33ramicreate