This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: itertools: takedowhile()
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: miss-islington, pavel-lexyr, phr, rhettinger, serhiy.storchaka, tim.peters
Priority: normal Keywords: patch

Created on 2021-07-05 22:33 by pavel-lexyr, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tmp_takewhile.py rhettinger, 2021-07-09 16:42 API experiments
Pull Requests
URL Status Linked Edit
PR 28167 merged rhettinger, 2021-09-04 20:18
PR 28173 merged miss-islington, 2021-09-05 05:09
Messages (17)
msg397025 - (view) Author: pavel-lexyr (pavel-lexyr) * Date: 2021-07-05 22:33
As described in the documentation, itertools.takewhile() returns all the elements until the first one that does not match the provided criterion. In case of a destructive iterator, or one with side effects, not yielding an element downstream may render takewhile() unsuitable for use.

Proposed is itertools.takedowhile() - an alternate function that yields the first false element as well, and returns after. The behaviour is identical otherwise.
msg397028 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-07-05 23:25
Thanks for the suggestion.  I agree that the loss of the non-matching element is an irritant.  The suggestion to return the first false element would solve that problem but is itself hard to work with.  The result would be difficult to reason about because all the elements are except one are true, the last is false, and you can't know that you have gotten a false element until one more call to next() to determine that no more elements are forthcoming.

Also, I'm reluctant to create any variants for takewhile() or dropwhile().  Those have been the least successful itertools.  If I had it to do over again, they would not have been included.  For the most part, generator based solutions are superior in terms of readability, flexibility, and performance.
msg397030 - (view) Author: pavel-lexyr (pavel-lexyr) * Date: 2021-07-05 23:34
I see. If the syntax allows for better ways to do it now, perhaps a move towards deprecation would be a better idea then? This would agree with the Zen.

Also, please elaborate more on the generator-based solutions you have in mind. The suggestion stems from a very real use case - and the lambda function we ended up using looks like a poor hack.
msg397032 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-07-06 08:26
What if set the last item as an attribute of the takewhile iterator?
msg397165 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-07-08 19:31
> What if set the last item as an attribute of the takewhile iterator?

Perhaps raise an attribute error unless the falsifying element is set?
msg397202 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-07-09 16:42
I've done some API experiments using a data munging example.  See attached file.

The proposed API for takewhile() to save the last attribute is somewhat awkward to use:

    it = iter(report)
    tw_it = takewhile(is_header, it)
    for line in takewhile(is_header, tw_it):
        print('Header:', repr(line))
    if hasattr(tw_it, 'odd_element'):
        it = chain([tw_it.odd_element], it)
    print(mean(map(int, it)))   

What is needed is a new itertool recipe to cover this use case:

    headers, data = before_and_after(is_header, report)
    for line in headers:
        print('Header:', repr(line))
    print(mean(map(int, data)))
msg397205 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-07-09 17:44
For convenience, the takewhile iterator can also have additional attributes: a boolean attribute which indicates that the falsifying element is set, and dynamic attribute which is equal to orig_iterator or chain([odd_element], orig_iterator).
msg397229 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-07-09 22:31
> For convenience, the takewhile iterator can also have
> additional attributes: a boolean attribute which indicates 
> that the falsifying element is set, and dynamic attribute 
> which is equal to orig_iterator 
> or chain([odd_element], orig_iterator).

Rather than graft a funky and atypical API onto takewhile(), it would be better to have a new tool that returns two iterators, the true steam, and a stream for the remaining values.  Either stream may be empty.  There is no need for a boolean flag attribute or a remaining stream attribute.  This design fits in better with the other itertools.

FWIW, we can already do this using groupby(), but it is only easy if we assume the first part of the stream is all true and the remainder of the stream is all false.  That isn't good enough for general application.
msg397253 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2021-07-11 01:17
I agree Raymond's `before_and_after()` looks like an elegant, efficient, and usable approach to this.

One minor nit: there's no need for the `iter()` call in:

        yield from iter(transition)

Indeed, it confused me at first, because `yield from x` does its own `iter(x)` call under the covers, and since everyone knows that ;-) , I wondered what deep trickery calling it again was intended to pull off.

But I persuaded myself there was no such subtle intent - it's just redundant.
msg397312 - (view) Author: pavel-lexyr (pavel-lexyr) * Date: 2021-07-12 14:16
There is a core part of the `takedowhile` proposal's use case that I am having trouble envisioning via the alternative `before_and_after` proposal. If the `after` part of the iterator the user does not engage with, the transitional elements will be stuck indefinitely. What would a correct usage be, in case one wants the following two conditions to hold true:

1. the amount of elements after the first falsifying one is minimal, i.e. 0
2. all the yielded elements are processed no matter what?
msg397336 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2021-07-12 16:30
If you don't use the 'after` iterator, then of course you'll never see the values (if any) it would have yielded.

How could it possibly be otherwise? By design and construction, the `before` iterator ends before yielding the first (if any) transitional element.

As Raymond said at the start, the `takedowhile()` proposal appears much harder to use correctly, since there's no reasonably sane way to know that the last value it yields _is_ the transitional element (or, perhaps, that there was no transitional element, and the underlying iterable was just exhausted without finding one).

If the proposal were instead for `takewhile_plus_one_more_if_any()`, then at least the ugly name would warn about the surprising intended behavior ;-)
msg397337 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2021-07-12 16:40
That said, if you really do want those semantics, it's easy to build on top of Raymond's API:

def takewhile_plus_one_more_if_any(pred, iterable):
    from itertools import islice, chain
    before, after = before_and_after(pred, iterable)
    return chain(before, islice(after, 1))
msg397338 - (view) Author: pavel-lexyr (pavel-lexyr) * Date: 2021-07-12 16:53
Thank you - that answers the questions. The use case where we would want to know if the last element is transitional or not completely slipped my mind for some reason.
msg401068 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-09-05 05:09
New changeset 91be41ad933e24bff26353a19f56447e17fb6367 by Raymond Hettinger in branch 'main':
bpo-44571:  Add itertool recipe for a variant of takewhile() (GH-28167)
https://github.com/python/cpython/commit/91be41ad933e24bff26353a19f56447e17fb6367
msg401069 - (view) Author: miss-islington (miss-islington) Date: 2021-09-05 05:30
New changeset 656b0bdfaae3a36d386afe3f7b991744528c3ff7 by Miss Islington (bot) in branch '3.10':
bpo-44571:  Add itertool recipe for a variant of takewhile() (GH-28167)
https://github.com/python/cpython/commit/656b0bdfaae3a36d386afe3f7b991744528c3ff7
msg403623 - (view) Author: paul rubin (phr) Date: 2021-10-11 07:09
Oh wow, before_and_after will go into the itertools module per that patch?  I found this issue while looking for a way to this, but had written the following implementation:

def span(pred, xs):
    # split xs into two iterators a,b where a() is the prefix of xs             
    # that satisfies the predicate, and b() is the rest of xs.                  
    # Similar to Data.List.Span in Haskell.                                     

    ixs = iter(xs)
    t = None
    def a():
        nonlocal t
        for x in ixs:
            if pred(x): yield x
            else: break
        t = x
    def b():
        return itertools.chain([t], ixs)
    return a, b

def tspan():  # test
    xs = [1,3,5,2,4,6,8]
    def odd(x): return x%2==1
    # This should print [1,3,5] then [2,4,6,8]                                  
    for p in span(odd, xs):
        print(list(p()))
msg403638 - (view) Author: paul rubin (phr) Date: 2021-10-11 09:15
Bah, the above doesn't work in the cases where the iterator is empty or (different symptom) where the tail part is empty.  Rather than posting a corrected version (unless someone wants it) I'll just say not to use that code fragment, but that the intended API still looks reasonable.  I do support having some kind of solution but don't want to keep stretching out an already closed discussion unless people think there is more to say.
History
Date User Action Args
2022-04-11 14:59:47adminsetgithub: 88737
2021-10-11 09:15:19phrsetmessages: + msg403638
2021-10-11 07:09:53phrsetnosy: + phr
messages: + msg403623
2021-09-05 05:30:44miss-islingtonsetmessages: + msg401069
2021-09-05 05:10:54rhettingersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-09-05 05:09:45miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request26600
2021-09-05 05:09:33rhettingersetmessages: + msg401068
2021-09-04 20:18:30rhettingersetkeywords: + patch
stage: patch review
pull_requests: + pull_request26596
2021-07-12 16:53:56pavel-lexyrsetmessages: + msg397338
2021-07-12 16:40:43tim.peterssetmessages: + msg397337
2021-07-12 16:30:50tim.peterssetmessages: + msg397336
2021-07-12 14:16:42pavel-lexyrsetmessages: + msg397312
2021-07-11 01:17:33tim.peterssetnosy: + tim.peters
messages: + msg397253
2021-07-09 22:31:53rhettingersetmessages: + msg397229
2021-07-09 17:44:59serhiy.storchakasetmessages: + msg397205
2021-07-09 16:42:44rhettingersetfiles: + tmp_takewhile.py

messages: + msg397202
2021-07-08 19:31:27rhettingersetmessages: + msg397165
2021-07-06 08:26:02serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg397032
2021-07-05 23:34:44pavel-lexyrsetmessages: + msg397030
2021-07-05 23:26:08rhettingersetcomponents: + Library (Lib)
versions: + Python 3.11
2021-07-05 23:25:59rhettingersetassignee: rhettinger
messages: + msg397028
2021-07-05 22:33:28pavel-lexyrcreate