classification
Title: Add itertools.first_true (return first true item in iterable)
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: ethan.furman, ezio.melotti, hynek, jab, loewis, lukasz.langa, mark.dickinson, ncoghlan, pitrou, python-dev, r.david.murray, raduv, rhettinger, serhiy.storchaka, tim.peters
Priority: normal Keywords:

Created on 2013-08-04 10:08 by hynek, last changed 2014-04-02 10:24 by rhettinger. This issue is now closed.

Messages (40)
msg194338 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2013-08-04 10:08
Let met try to get you sold on adding the “first” function I released on PyPI roughly a year ago:

https://github.com/hynek/first

It’s a very nice complement to functions like `any()` or itertools. I consists effectively of 9 lines of code but it proved extremely handy in production.

***

It returns the first true value from an iterable or a default:

>>> first([0, None, False, [], (), 42])
42

>>> first([0, None, False, [], ()], default=42)
42

Additionally it also allows for a key function:

>>> first([1, 1, 3, 4, 5], key=lambda x: x % 2 == 0)
4

***

First happens to be especially useful together with the re module:

import re

from first import first


re1 = re.compile('b(.*)')
re2 = re.compile('a(.*)')

m = first(regexp.match('abc') for regexp in [re1, re2])
if not m:
   print('no match!')
elif m.re is re1:
   print('re1', m.group(1))
elif m.re is re2:
   print('re2', m.group(1))


All the knee-jerk alternatives to it have some shortcomings:

next(itertools.ifilter(None, (regexp.match('abc') for regexp in [re1, re2])), None)
next((regexp.match('abc') for regexp in [re1, re2] if regexp.match('abc')), None)

None of them is Pythonic and the second one even has to call match() twice, which is *not* a cheap method to call.

Here the first version for comparison again:

first(regexp.match('abc') for regexp in [re1, re2])

It doesn’t even exhaust the iterator if not necessary.

***

I don’t cling to neither the name or the exact function signature (although it got polished with the help of several people, two of them core developers).  I also don’t really care whether it gets added along of any() or put into itertools.  I just know that I and several other people would appreciate to have such a handy function in the stdlib – I even got an e-mail from OpenStack folks asking when it will be added because they would like to use it and there’s even a debian package by now: http://packages.debian.org/unstable/python-first

There’s also this question on StackOverflow: http://stackoverflow.com/questions/1077307/why-is-there-no-firstiterable-built-in-function-in-python which is nice but doesn’t fix the subtleties like when there is no true value etc which makes it useless for production code and one has to write boilerplate code every single time.

It was even one of five Python packages Lukasz Langa deemed worthy to be highlighted in his PyCon 2013 lightning talk: http://youtu.be/1vui-LupKJI?t=20m40s

FWIW, SQL has a similar function called COALESCE ( https://en.wikipedia.org/wiki/Null_(SQL)#COALESCE ) which only handles NULL though.

***

I’ll happily respond to any questions or concerns that may arise and supply a patch as soon as we agree on a place to add it.
msg194341 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-04 10:28
The function name looks a little misleading. I expected first([0, None, False, [], (), 42]) returns 0.
msg194343 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-08-04 10:58
It's a direct counterpart to any() and all() - first([0, [], ()]) is None for the same reason that any([0, [], ()]) and all([0, [], ()]) are both False.

If first returned the actual first item in the iterable (regardless of truth value), then it would just be "next" under a different name, which would be rather pointless.

That said, if "first" is deemed too ambiguous, then I believe "filterednext" would be a reasonable more explicit name:

>>> filterednext([0, None, False, [], (), 42])
42

>>> filterednext([0, None, False, [], ()], default=42)
42

>>> filterednext([1, 1, 3, 4, 5], key=lambda x: x % 2 == 0)
4

>>> m = filterednext(regexp.match('abc') for regexp in [re1, re2])

I also believe itertools would be a more appropriate initial home than the builtins.
msg194346 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-04 11:27
In this case it would be pointless too. It is just

    next(filter(key, iterable), default)

Are you need a special function for combination of two builtins?
msg194348 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2013-08-04 11:36
`filter()` exhausts the full iterator which is potentially very expensive – like in conduction with regular expressions.
msg194349 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-08-04 11:38
I also think it should go to itertools.

I also found the name "first" confusing, in particular since it means what Serhiy assumes in LISP, which might be familiar to people interested in functional list processing. Also, Ruby and Smalltalk use "first" in that sense.

To add to the bike shedding, I propose "find_if" (from LISP), "coalesce" (from SQL), or "detect" (from Smalltalk).

-1 on calling the filter function "key=". How about "pred=" (like all other itertools predicates) or "filter="?
msg194350 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-08-04 11:38
Serhiy, Hynek covered the issue with the status quo in the original
proposal.The existing alternative are painful to try and decipher by
comparison with the named function:

    filterednext([0, None, False, [], (), 42])
vs
    next(filter(None, [0, None, False, [], (), 42]))

    filterednext([0, None, False, [], ()], default=42)
vs
    next(filter(None, [0, None, False, [], (), 42]), 42)

    filterednext([1, 1, 3, 4, 5], key=lambda x: x % 2 == 0)
vs
    next(filter(lambda x: x % 2 == 0, [1, 1, 3, 4, 5]))

    m = filterednext(regexp.match('abc') for regexp in [re1, re2])
vs
    m = next(filter(None, (regexp.match('abc') for regexp in [re1, re2])))

Hynek - the Python 3 filter is an iterator, so it works like the
itertools.ifilter version in Python 2.
msg194351 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2013-08-04 11:41
Ah ok sorry. Anyhow, it’s just a very common idiom that should be easy and readable.

As said, I’m not married to any names at all and would happily add a compatibility package to PyPI with the new names/parameters.
msg194352 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-08-04 11:49
Nick: that the code is difficult to decipher is really the fault of functional programming, which is inherently difficult to decipher (since last function applied is written first).

Explicit iteration is easier to read. I would write Hynek's example as

for r in (re1, re2):
    m = r.match('abc')
    if not m:
        print('No match)
    elif r is re1:
        print('re1', m.group(1))
    elif r is re2:
        print('re1', m.group(1))
    break # always

This is only two additional lines, very Pythonic (IMO), and doesn't invoke match unnecessarily.
msg194353 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-08-04 11:50
Regarding the key parameter name, I believe this is closer to itertools.groupby (which uses "key=" as an optional argument, akin to min, max and sorted) than it is to filterfalse, dropwhile or takewhile (which use "pred" as the first positional argument)

The only use of "pred" in the optional key argument sense appears to be the "quantify" recipe.

+1 for itertools.coalesce, taking the name from SQL. It's designed to serve exactly the same purpose as COALESCE does there, doesn't risk confusion with next-like behaviour the way "first" does and hints strongly at the fact it is a reduction operation from an iterable to a single value.
msg194354 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2013-08-04 11:58
Martin, I don’t find the loop easier to read because you have to *remember* the `break` otherwise “weird stuff happens”.

Coalesce seems common enough, I would +1 on that too.
msg194355 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-08-04 12:09
(Updated the issue title to reflect the currently proposed name and location for the functionality)

While I'm a fan of explicit iteration as well ("inside every reduce is a loop trying to get out"), I think the fact Martin's explicit loop is buggy (it will never match re2 as it always bails on the first iteration) helps make the case for coalesce. The correct explicit loop looks something like this:

    for r in (re1, re2):
        m = r.match('abc')
        if m is None:
            continue
        if r is re1:
            print('re1', m.group(1))
        elif r is re2:
            print('re1', m.group(1))
        break # Matched something
    else:
        print('No match')

(Or the equivalent that indents the loop body further and avoids the continue statement)

The coalesce version has a definite advantage in not needing a loop else clause to handle the "nothing matched" case:

    m = coalesce(regexp.match('abc') for regexp in [re1, re2])
    if m is None:
        print('no match!')
    elif m.re is re1:
        print('re1', m.group(1))
    elif m.re is re2:
        print('re2', m.group(1))
msg194356 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-08-04 12:14
Simplicity is in the eye of the beholder, yet... you need to remember the break when *writing* the code, so the loop might be more difficult to write (but then, I need to remember/lookup the function name and parameters for coalesce)... when reading the code, the break is already there, and easy to notice. With Nick's remark, it's even more obvious that it is difficult to write :-)

If an unknown (to the reader) function is used, reading the code becomes actually difficult, since the reader either needs to guess what the function does, or look it up.

Note that I'm not objecting the addition of the function (I'm neutral), just the claim that there are no simple alternatives. I'm neutral because it falls under the "not every two-line function needs to go into the standard library" rule; but then, if it is really popular, it helps readability if it is in the library (rather than duplicated by users).
msg194359 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-08-04 12:39
[Nick]
> Regarding the key parameter name, I believe this is closer to
> itertools.groupby ...

Isn't this mostly about the return type?  `pred` is an indication that a boolean is being returned (or that the return value will be interpreted in a boolean context), while `key` is used for more general transformations.  In that sense, `pred` makes more sense here.
msg194361 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-08-04 12:43
Mark's rationale makes sense to me. I believe that would make the
latest version of the proposed API (in the itertools module):

    def coalesce(iterable, default=None, pred=None):
        ...
msg194369 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-04 13:29
def coalesce(iterable, default=None, pred=None):
        return next(filter(pred, iterable), default)

Are you sure you want add this one-line function to the itertools module rather then to recipes?
msg194380 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2013-08-04 14:23
> def coalesce(iterable, default=None, pred=None):
>        return next(filter(pred, iterable), default)
> 
> Are you sure you want add this one-line function to the itertools module rather then to recipes?

Well, for many – including me – it would mean to have this one-line function in every other project or a PyPI dependency.  I’m certain there are other short but useful functions in the stdlib.
msg194392 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-08-04 16:31
I'm not going to object to the name, since I see that it is used elsewhere in programming for the proposed meaning.  But allow me to rant about the corruption of the English language here.  To me, "coalesce" should involve a computation based on all of the elements of a list, not just picking out the first non-false value.
msg194425 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-04 21:08
> Well, for many – including me – it would mean to have this one-line function in every other project or a PyPI dependency.

But why you want to have a separate function instead of just use two builtins?
msg194428 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-08-04 21:25
FWIW, I like this.  It would be a nice addition to the itertools module. +1

The `key` argument should be renamed to `pred`, as others have said.

As to the name, I like "first_true".  Does what it says.  Plain "first" is misleading, and "coalesce" is both inscrutable and nearly impossible to spell ;-)
msg194429 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2013-08-04 21:26
> But why you want to have a separate function instead of just use two builtins?

This question has been answered twice now, once from Nick – please refer above.

It's a clunky and error-prone solution to a common problem. Maybe you can't emphasize because it's not a common problem to you but that doesn't make it less useful.
msg194436 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-08-04 23:14
A wild timbot appears! :)

Tim Peters added the comment:
> As to the name, I like "first_true".  Does what it says.  Plain "first"
is misleading, and "coalesce" is both inscrutable and nearly impossible to
spell ;-)

A fair point, although skipping the underscore ("firsttrue" or "nexttrue")
would arguably be more consistent with other itertools names.

I guess it's over to Raymond now as module maintainer for a yes/docs
recipe/no response, and if one of the first two, the exact name used.
msg194439 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-08-04 23:46
> skipping the underscore ("firsttrue" or "nexttrue")
> would arguably be more consistent with other itertools names.

Like izip_longest and combinations_with_replacement? ;-)

I care about readability here more than consistency with the bulk of itertools names.  "first" ends with "t" and "true" begins with "t", which makes "firsttrue" an eyesore.  "first_true" strikes me as very easy to read, spell, remember and understand.

Offhand I don't know what "nexttrue" is supposed to mean (like "if that's supposed to mean the first true iterate, why didn't they say 'first' instead of 'next'?").
msg194441 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-08-05 00:04
I concede Tim's point about itertools already using underscores where it improves readability of a name :)

So, the latest state of the proposal is to add the following, preferably directly to the module, but at least as a recipe in the itertools docs:

    def first_true(iterable, default=None, pred=None):
        """Returns the first true item in the iterable

        If no such item is found, returns *default*
        If *pred* is not None, returns the first item
        for which pred(item) is true.
        """
        return next(filter(pred, iterable), default)
msg194472 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-08-05 12:34
+1 on the name 'first_true'.  Does exactly what it says on the tin.
msg194474 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2013-08-05 12:48
> +1 on the name 'first_true'.  Does exactly what it says on the tin.

I fully agree.

***

I assume what's missing now is a permission from Raymond to mess with his turf?
msg194515 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-05 21:42
I would like to chime in and say that the fact that the functions exists in SQL (and possibly Lisp, Snobol and PHP) sells me on the idea!

Okay, only joking, but I do think this is a useful addition.
"first_true" should be its name, but itertools already has a tradition of nonunderscorednames, so it should be "firsttrue", which may look weird because of the doubled t.
msg194516 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-05 21:47
(but you don't have to trust me: itertools also has izip_longest() and combinations_with_replacement())
msg194873 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2013-08-11 08:35
So I wanted to provide a first patch to move the discussion on and realized that itertools appears currently to be completely inside of `Modules/itertoolsmodule.c`. :-/

Any volunteers? :)
msg194877 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-11 09:32
How large will be a C implementation of this one-line function?

I'm still -1 for polluting the itertools module with trivial combinations of existing functions.
msg194880 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-11 09:53
> How large will be a C implementation of this one-line function?
> 
> I'm still -1 for polluting the itertools module with trivial
> combinations of existing functions.

The solution is to move the current itertools to _itertools and have a
companion itertools.py.
msg194887 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-11 10:26
Looks as too much work for too small gain (and I'm suppose the total gain is negative if we count all costs).
msg195379 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-08-16 17:44
For what it is worth, I am currently writing some email tests and it would certainly be convenient to have this.  Of course I *can* define it locally in the the test file.
msg195456 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2013-08-17 09:58
Well that's the point: it's extremely handy but simple.  I wish Raymond would pronounce on this.  I can keep using the PyPI version for all I care, so I'm not going fight for it.  But with one exception there seems to be an agreement that it would be a very fine thing to have.
msg195596 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-08-19 03:40
Hynek, thanks for your suggestion.  I would like to mull it over for a while before chiming in.
msg195624 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-08-19 12:28
OK, now I have a place in the non-test email code where using this would lead to easier-to-read code.
msg199979 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-15 07:54
> OK, now I have a place in the non-test email code where using this would lead to easier-to-read code.

Now you have not this place. ;)
msg212113 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-02-24 16:09
As soon as the trunk opens again, I'll add Nick's version to the itertools  recipes section.  

I'm disinclined to add it directly to the itertools module as a native C function for several reasons:


* The problem solved isn't that common (very little of code I've ever written or seen would be better-off if this function had been available).   The OP's regex use case was shown by Martin to be better written as an explicit for-loop.  I think this is probably also true in the general case.

* The module unity is about creating iterators.  Consumers of iterators such as any(), all(), next(), sum(), min(), max() and others all lie elsewhere. 

* For-loops are likely to be the one obvious way to do it for other variants such first_false(), first_true_raise_exception_if_not_found(),
 or first_greater_than_limit(), etc.

* For the most part, the functional style of itertool composition has been a win for performance, but not for readabilty.  Also, users of the module seem to suffer from having-too-many-choices.  Accordingly, I only want to grow the module for cases where there is a clear win.

I would have pronounced on this one sooner but was stayed by the enthusiasm of the participants in this thread.

David said, "now I have a place in the non-test email code where using this would lead to easier-to-read code".   IMO, this is telling.  The bar is higher than "I would have used this one time".  That doesn't make it worth having to learn and remember.
msg212119 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-24 17:28
> David said, "now I have a place in the non-test email code where using this
> would lead to easier-to-read code".   IMO, this is telling.  The bar is
> higher than "I would have used this one time".  That doesn't make it worth
> having to learn and remember.

I will explain my last comment. Four months ago this place was rewritten, so 
now there is no place for this function.
msg215372 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-04-02 10:17
New changeset 3e2354dde892 by Raymond Hettinger in branch '3.4':
Issue #18652:  Add an itertools recipe for first_true()
http://hg.python.org/cpython/rev/3e2354dde892
History
Date User Action Args
2014-04-02 10:24:00rhettingersetstatus: open -> closed
resolution: fixed
2014-04-02 10:17:41python-devsetnosy: + python-dev
messages: + msg215372
2014-04-02 05:35:47jabsetnosy: + jab
2014-02-24 17:28:23serhiy.storchakasetmessages: + msg212119
2014-02-24 16:09:19rhettingersetmessages: + msg212113
2014-02-15 18:57:10ethan.furmansetnosy: + ethan.furman
2014-02-15 01:29:36ncoghlansetversions: + Python 3.5, - Python 3.4
2013-10-15 07:54:17serhiy.storchakasetmessages: + msg199979
2013-08-23 11:00:34raduvsetnosy: + raduv
2013-08-19 12:28:00r.david.murraysetmessages: + msg195624
2013-08-19 03:40:39rhettingersetmessages: + msg195596
2013-08-19 03:26:52rhettingersetassignee: rhettinger
2013-08-17 09:58:24hyneksetmessages: + msg195456
2013-08-16 17:44:56r.david.murraysetmessages: + msg195379
2013-08-11 10:26:46serhiy.storchakasetmessages: + msg194887
2013-08-11 09:53:17pitrousetmessages: + msg194880
2013-08-11 09:32:39serhiy.storchakasetmessages: + msg194877
2013-08-11 08:35:22hyneksetassignee: hynek -> (no value)
messages: + msg194873
stage: needs patch
2013-08-10 17:18:46ezio.melottisetnosy: + ezio.melotti
2013-08-05 21:47:05pitrousetmessages: + msg194516
2013-08-05 21:42:54pitrousetnosy: + pitrou
messages: + msg194515
2013-08-05 12:48:58hyneksetmessages: + msg194474
2013-08-05 12:34:48mark.dickinsonsetmessages: + msg194472
2013-08-05 00:04:45ncoghlansetmessages: + msg194441
title: Add itertools.coalesce -> Add itertools.first_true (return first true item in iterable)
2013-08-04 23:46:34tim.peterssetmessages: + msg194439
2013-08-04 23:14:49ncoghlansetmessages: + msg194436
2013-08-04 21:26:54hyneksetmessages: + msg194429
2013-08-04 21:25:26tim.peterssetnosy: + tim.peters
messages: + msg194428
2013-08-04 21:08:03serhiy.storchakasetmessages: + msg194425
2013-08-04 16:31:00r.david.murraysetnosy: + r.david.murray
messages: + msg194392
2013-08-04 14:23:42hyneksetmessages: + msg194380
2013-08-04 13:29:41serhiy.storchakasetmessages: + msg194369
2013-08-04 12:43:01ncoghlansetmessages: + msg194361
2013-08-04 12:39:05mark.dickinsonsetnosy: + mark.dickinson
messages: + msg194359
2013-08-04 12:14:26loewissetmessages: + msg194356
2013-08-04 12:09:45ncoghlansetmessages: + msg194355
title: Add a “first”-like function to the stdlib -> Add itertools.coalesce
2013-08-04 11:58:16hyneksetmessages: + msg194354
title: Add a “first” function to the stdlib -> Add a “first”-like function to the stdlib
2013-08-04 11:50:16ncoghlansetmessages: + msg194353
2013-08-04 11:49:38loewissetmessages: + msg194352
2013-08-04 11:41:34hyneksetmessages: + msg194351
2013-08-04 11:38:45ncoghlansetmessages: + msg194350
2013-08-04 11:38:39loewissetnosy: + loewis
messages: + msg194349
2013-08-04 11:36:23hyneksetmessages: + msg194348
2013-08-04 11:27:24serhiy.storchakasetmessages: + msg194346
2013-08-04 10:58:38ncoghlansetmessages: + msg194343
2013-08-04 10:28:59serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg194341
2013-08-04 10:08:54hynekcreate