Issue18652
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-08-04 10:08 by hynek, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (40) | |||
---|---|---|---|
msg194338 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2013-08-04 10:08 | |
Let met try to get you sold on adding the “first” function I released on PyPI roughly a year ago: https://github.com/hynek/first It’s a very nice complement to functions like `any()` or itertools. I consists effectively of 9 lines of code but it proved extremely handy in production. *** It returns the first true value from an iterable or a default: >>> first([0, None, False, [], (), 42]) 42 >>> first([0, None, False, [], ()], default=42) 42 Additionally it also allows for a key function: >>> first([1, 1, 3, 4, 5], key=lambda x: x % 2 == 0) 4 *** First happens to be especially useful together with the re module: import re from first import first re1 = re.compile('b(.*)') re2 = re.compile('a(.*)') m = first(regexp.match('abc') for regexp in [re1, re2]) if not m: print('no match!') elif m.re is re1: print('re1', m.group(1)) elif m.re is re2: print('re2', m.group(1)) All the knee-jerk alternatives to it have some shortcomings: next(itertools.ifilter(None, (regexp.match('abc') for regexp in [re1, re2])), None) next((regexp.match('abc') for regexp in [re1, re2] if regexp.match('abc')), None) None of them is Pythonic and the second one even has to call match() twice, which is *not* a cheap method to call. Here the first version for comparison again: first(regexp.match('abc') for regexp in [re1, re2]) It doesn’t even exhaust the iterator if not necessary. *** I don’t cling to neither the name or the exact function signature (although it got polished with the help of several people, two of them core developers). I also don’t really care whether it gets added along of any() or put into itertools. I just know that I and several other people would appreciate to have such a handy function in the stdlib – I even got an e-mail from OpenStack folks asking when it will be added because they would like to use it and there’s even a debian package by now: http://packages.debian.org/unstable/python-first There’s also this question on StackOverflow: http://stackoverflow.com/questions/1077307/why-is-there-no-firstiterable-built-in-function-in-python which is nice but doesn’t fix the subtleties like when there is no true value etc which makes it useless for production code and one has to write boilerplate code every single time. It was even one of five Python packages Lukasz Langa deemed worthy to be highlighted in his PyCon 2013 lightning talk: http://youtu.be/1vui-LupKJI?t=20m40s FWIW, SQL has a similar function called COALESCE ( https://en.wikipedia.org/wiki/Null_(SQL)#COALESCE ) which only handles NULL though. *** I’ll happily respond to any questions or concerns that may arise and supply a patch as soon as we agree on a place to add it. |
|||
msg194341 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-08-04 10:28 | |
The function name looks a little misleading. I expected first([0, None, False, [], (), 42]) returns 0. |
|||
msg194343 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2013-08-04 10:58 | |
It's a direct counterpart to any() and all() - first([0, [], ()]) is None for the same reason that any([0, [], ()]) and all([0, [], ()]) are both False. If first returned the actual first item in the iterable (regardless of truth value), then it would just be "next" under a different name, which would be rather pointless. That said, if "first" is deemed too ambiguous, then I believe "filterednext" would be a reasonable more explicit name: >>> filterednext([0, None, False, [], (), 42]) 42 >>> filterednext([0, None, False, [], ()], default=42) 42 >>> filterednext([1, 1, 3, 4, 5], key=lambda x: x % 2 == 0) 4 >>> m = filterednext(regexp.match('abc') for regexp in [re1, re2]) I also believe itertools would be a more appropriate initial home than the builtins. |
|||
msg194346 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-08-04 11:27 | |
In this case it would be pointless too. It is just next(filter(key, iterable), default) Are you need a special function for combination of two builtins? |
|||
msg194348 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2013-08-04 11:36 | |
`filter()` exhausts the full iterator which is potentially very expensive – like in conduction with regular expressions. |
|||
msg194349 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2013-08-04 11:38 | |
I also think it should go to itertools. I also found the name "first" confusing, in particular since it means what Serhiy assumes in LISP, which might be familiar to people interested in functional list processing. Also, Ruby and Smalltalk use "first" in that sense. To add to the bike shedding, I propose "find_if" (from LISP), "coalesce" (from SQL), or "detect" (from Smalltalk). -1 on calling the filter function "key=". How about "pred=" (like all other itertools predicates) or "filter="? |
|||
msg194350 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2013-08-04 11:38 | |
Serhiy, Hynek covered the issue with the status quo in the original proposal.The existing alternative are painful to try and decipher by comparison with the named function: filterednext([0, None, False, [], (), 42]) vs next(filter(None, [0, None, False, [], (), 42])) filterednext([0, None, False, [], ()], default=42) vs next(filter(None, [0, None, False, [], (), 42]), 42) filterednext([1, 1, 3, 4, 5], key=lambda x: x % 2 == 0) vs next(filter(lambda x: x % 2 == 0, [1, 1, 3, 4, 5])) m = filterednext(regexp.match('abc') for regexp in [re1, re2]) vs m = next(filter(None, (regexp.match('abc') for regexp in [re1, re2]))) Hynek - the Python 3 filter is an iterator, so it works like the itertools.ifilter version in Python 2. |
|||
msg194351 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2013-08-04 11:41 | |
Ah ok sorry. Anyhow, it’s just a very common idiom that should be easy and readable. As said, I’m not married to any names at all and would happily add a compatibility package to PyPI with the new names/parameters. |
|||
msg194352 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2013-08-04 11:49 | |
Nick: that the code is difficult to decipher is really the fault of functional programming, which is inherently difficult to decipher (since last function applied is written first). Explicit iteration is easier to read. I would write Hynek's example as for r in (re1, re2): m = r.match('abc') if not m: print('No match) elif r is re1: print('re1', m.group(1)) elif r is re2: print('re1', m.group(1)) break # always This is only two additional lines, very Pythonic (IMO), and doesn't invoke match unnecessarily. |
|||
msg194353 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2013-08-04 11:50 | |
Regarding the key parameter name, I believe this is closer to itertools.groupby (which uses "key=" as an optional argument, akin to min, max and sorted) than it is to filterfalse, dropwhile or takewhile (which use "pred" as the first positional argument) The only use of "pred" in the optional key argument sense appears to be the "quantify" recipe. +1 for itertools.coalesce, taking the name from SQL. It's designed to serve exactly the same purpose as COALESCE does there, doesn't risk confusion with next-like behaviour the way "first" does and hints strongly at the fact it is a reduction operation from an iterable to a single value. |
|||
msg194354 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2013-08-04 11:58 | |
Martin, I don’t find the loop easier to read because you have to *remember* the `break` otherwise “weird stuff happens”. Coalesce seems common enough, I would +1 on that too. |
|||
msg194355 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2013-08-04 12:09 | |
(Updated the issue title to reflect the currently proposed name and location for the functionality) While I'm a fan of explicit iteration as well ("inside every reduce is a loop trying to get out"), I think the fact Martin's explicit loop is buggy (it will never match re2 as it always bails on the first iteration) helps make the case for coalesce. The correct explicit loop looks something like this: for r in (re1, re2): m = r.match('abc') if m is None: continue if r is re1: print('re1', m.group(1)) elif r is re2: print('re1', m.group(1)) break # Matched something else: print('No match') (Or the equivalent that indents the loop body further and avoids the continue statement) The coalesce version has a definite advantage in not needing a loop else clause to handle the "nothing matched" case: m = coalesce(regexp.match('abc') for regexp in [re1, re2]) if m is None: print('no match!') elif m.re is re1: print('re1', m.group(1)) elif m.re is re2: print('re2', m.group(1)) |
|||
msg194356 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2013-08-04 12:14 | |
Simplicity is in the eye of the beholder, yet... you need to remember the break when *writing* the code, so the loop might be more difficult to write (but then, I need to remember/lookup the function name and parameters for coalesce)... when reading the code, the break is already there, and easy to notice. With Nick's remark, it's even more obvious that it is difficult to write :-) If an unknown (to the reader) function is used, reading the code becomes actually difficult, since the reader either needs to guess what the function does, or look it up. Note that I'm not objecting the addition of the function (I'm neutral), just the claim that there are no simple alternatives. I'm neutral because it falls under the "not every two-line function needs to go into the standard library" rule; but then, if it is really popular, it helps readability if it is in the library (rather than duplicated by users). |
|||
msg194359 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2013-08-04 12:39 | |
[Nick] > Regarding the key parameter name, I believe this is closer to > itertools.groupby ... Isn't this mostly about the return type? `pred` is an indication that a boolean is being returned (or that the return value will be interpreted in a boolean context), while `key` is used for more general transformations. In that sense, `pred` makes more sense here. |
|||
msg194361 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2013-08-04 12:43 | |
Mark's rationale makes sense to me. I believe that would make the latest version of the proposed API (in the itertools module): def coalesce(iterable, default=None, pred=None): ... |
|||
msg194369 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-08-04 13:29 | |
def coalesce(iterable, default=None, pred=None): return next(filter(pred, iterable), default) Are you sure you want add this one-line function to the itertools module rather then to recipes? |
|||
msg194380 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2013-08-04 14:23 | |
> def coalesce(iterable, default=None, pred=None): > return next(filter(pred, iterable), default) > > Are you sure you want add this one-line function to the itertools module rather then to recipes? Well, for many – including me – it would mean to have this one-line function in every other project or a PyPI dependency. I’m certain there are other short but useful functions in the stdlib. |
|||
msg194392 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2013-08-04 16:31 | |
I'm not going to object to the name, since I see that it is used elsewhere in programming for the proposed meaning. But allow me to rant about the corruption of the English language here. To me, "coalesce" should involve a computation based on all of the elements of a list, not just picking out the first non-false value. |
|||
msg194425 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-08-04 21:08 | |
> Well, for many – including me – it would mean to have this one-line function in every other project or a PyPI dependency. But why you want to have a separate function instead of just use two builtins? |
|||
msg194428 - (view) | Author: Tim Peters (tim.peters) * | Date: 2013-08-04 21:25 | |
FWIW, I like this. It would be a nice addition to the itertools module. +1 The `key` argument should be renamed to `pred`, as others have said. As to the name, I like "first_true". Does what it says. Plain "first" is misleading, and "coalesce" is both inscrutable and nearly impossible to spell ;-) |
|||
msg194429 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2013-08-04 21:26 | |
> But why you want to have a separate function instead of just use two builtins? This question has been answered twice now, once from Nick – please refer above. It's a clunky and error-prone solution to a common problem. Maybe you can't emphasize because it's not a common problem to you but that doesn't make it less useful. |
|||
msg194436 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2013-08-04 23:14 | |
A wild timbot appears! :) Tim Peters added the comment: > As to the name, I like "first_true". Does what it says. Plain "first" is misleading, and "coalesce" is both inscrutable and nearly impossible to spell ;-) A fair point, although skipping the underscore ("firsttrue" or "nexttrue") would arguably be more consistent with other itertools names. I guess it's over to Raymond now as module maintainer for a yes/docs recipe/no response, and if one of the first two, the exact name used. |
|||
msg194439 - (view) | Author: Tim Peters (tim.peters) * | Date: 2013-08-04 23:46 | |
> skipping the underscore ("firsttrue" or "nexttrue") > would arguably be more consistent with other itertools names. Like izip_longest and combinations_with_replacement? ;-) I care about readability here more than consistency with the bulk of itertools names. "first" ends with "t" and "true" begins with "t", which makes "firsttrue" an eyesore. "first_true" strikes me as very easy to read, spell, remember and understand. Offhand I don't know what "nexttrue" is supposed to mean (like "if that's supposed to mean the first true iterate, why didn't they say 'first' instead of 'next'?"). |
|||
msg194441 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2013-08-05 00:04 | |
I concede Tim's point about itertools already using underscores where it improves readability of a name :) So, the latest state of the proposal is to add the following, preferably directly to the module, but at least as a recipe in the itertools docs: def first_true(iterable, default=None, pred=None): """Returns the first true item in the iterable If no such item is found, returns *default* If *pred* is not None, returns the first item for which pred(item) is true. """ return next(filter(pred, iterable), default) |
|||
msg194472 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2013-08-05 12:34 | |
+1 on the name 'first_true'. Does exactly what it says on the tin. |
|||
msg194474 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2013-08-05 12:48 | |
> +1 on the name 'first_true'. Does exactly what it says on the tin. I fully agree. *** I assume what's missing now is a permission from Raymond to mess with his turf? |
|||
msg194515 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-05 21:42 | |
I would like to chime in and say that the fact that the functions exists in SQL (and possibly Lisp, Snobol and PHP) sells me on the idea! Okay, only joking, but I do think this is a useful addition. "first_true" should be its name, but itertools already has a tradition of nonunderscorednames, so it should be "firsttrue", which may look weird because of the doubled t. |
|||
msg194516 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-05 21:47 | |
(but you don't have to trust me: itertools also has izip_longest() and combinations_with_replacement()) |
|||
msg194873 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2013-08-11 08:35 | |
So I wanted to provide a first patch to move the discussion on and realized that itertools appears currently to be completely inside of `Modules/itertoolsmodule.c`. :-/ Any volunteers? :) |
|||
msg194877 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-08-11 09:32 | |
How large will be a C implementation of this one-line function? I'm still -1 for polluting the itertools module with trivial combinations of existing functions. |
|||
msg194880 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-11 09:53 | |
> How large will be a C implementation of this one-line function? > > I'm still -1 for polluting the itertools module with trivial > combinations of existing functions. The solution is to move the current itertools to _itertools and have a companion itertools.py. |
|||
msg194887 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-08-11 10:26 | |
Looks as too much work for too small gain (and I'm suppose the total gain is negative if we count all costs). |
|||
msg195379 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2013-08-16 17:44 | |
For what it is worth, I am currently writing some email tests and it would certainly be convenient to have this. Of course I *can* define it locally in the the test file. |
|||
msg195456 - (view) | Author: Hynek Schlawack (hynek) * | Date: 2013-08-17 09:58 | |
Well that's the point: it's extremely handy but simple. I wish Raymond would pronounce on this. I can keep using the PyPI version for all I care, so I'm not going fight for it. But with one exception there seems to be an agreement that it would be a very fine thing to have. |
|||
msg195596 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2013-08-19 03:40 | |
Hynek, thanks for your suggestion. I would like to mull it over for a while before chiming in. |
|||
msg195624 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2013-08-19 12:28 | |
OK, now I have a place in the non-test email code where using this would lead to easier-to-read code. |
|||
msg199979 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-10-15 07:54 | |
> OK, now I have a place in the non-test email code where using this would lead to easier-to-read code. Now you have not this place. ;) |
|||
msg212113 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2014-02-24 16:09 | |
As soon as the trunk opens again, I'll add Nick's version to the itertools recipes section. I'm disinclined to add it directly to the itertools module as a native C function for several reasons: * The problem solved isn't that common (very little of code I've ever written or seen would be better-off if this function had been available). The OP's regex use case was shown by Martin to be better written as an explicit for-loop. I think this is probably also true in the general case. * The module unity is about creating iterators. Consumers of iterators such as any(), all(), next(), sum(), min(), max() and others all lie elsewhere. * For-loops are likely to be the one obvious way to do it for other variants such first_false(), first_true_raise_exception_if_not_found(), or first_greater_than_limit(), etc. * For the most part, the functional style of itertool composition has been a win for performance, but not for readabilty. Also, users of the module seem to suffer from having-too-many-choices. Accordingly, I only want to grow the module for cases where there is a clear win. I would have pronounced on this one sooner but was stayed by the enthusiasm of the participants in this thread. David said, "now I have a place in the non-test email code where using this would lead to easier-to-read code". IMO, this is telling. The bar is higher than "I would have used this one time". That doesn't make it worth having to learn and remember. |
|||
msg212119 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2014-02-24 17:28 | |
> David said, "now I have a place in the non-test email code where using this > would lead to easier-to-read code". IMO, this is telling. The bar is > higher than "I would have used this one time". That doesn't make it worth > having to learn and remember. I will explain my last comment. Four months ago this place was rewritten, so now there is no place for this function. |
|||
msg215372 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-04-02 10:17 | |
New changeset 3e2354dde892 by Raymond Hettinger in branch '3.4': Issue #18652: Add an itertools recipe for first_true() http://hg.python.org/cpython/rev/3e2354dde892 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:49 | admin | set | github: 62852 |
2014-04-02 10:24:00 | rhettinger | set | status: open -> closed resolution: fixed |
2014-04-02 10:17:41 | python-dev | set | nosy:
+ python-dev messages: + msg215372 |
2014-04-02 05:35:47 | jab | set | nosy:
+ jab |
2014-02-24 17:28:23 | serhiy.storchaka | set | messages: + msg212119 |
2014-02-24 16:09:19 | rhettinger | set | messages: + msg212113 |
2014-02-15 18:57:10 | ethan.furman | set | nosy:
+ ethan.furman |
2014-02-15 01:29:36 | ncoghlan | set | versions: + Python 3.5, - Python 3.4 |
2013-10-15 07:54:17 | serhiy.storchaka | set | messages: + msg199979 |
2013-08-23 11:00:34 | raduv | set | nosy:
+ raduv |
2013-08-19 12:28:00 | r.david.murray | set | messages: + msg195624 |
2013-08-19 03:40:39 | rhettinger | set | messages: + msg195596 |
2013-08-19 03:26:52 | rhettinger | set | assignee: rhettinger |
2013-08-17 09:58:24 | hynek | set | messages: + msg195456 |
2013-08-16 17:44:56 | r.david.murray | set | messages: + msg195379 |
2013-08-11 10:26:46 | serhiy.storchaka | set | messages: + msg194887 |
2013-08-11 09:53:17 | pitrou | set | messages: + msg194880 |
2013-08-11 09:32:39 | serhiy.storchaka | set | messages: + msg194877 |
2013-08-11 08:35:22 | hynek | set | assignee: hynek -> (no value) messages: + msg194873 stage: needs patch |
2013-08-10 17:18:46 | ezio.melotti | set | nosy:
+ ezio.melotti |
2013-08-05 21:47:05 | pitrou | set | messages: + msg194516 |
2013-08-05 21:42:54 | pitrou | set | nosy:
+ pitrou messages: + msg194515 |
2013-08-05 12:48:58 | hynek | set | messages: + msg194474 |
2013-08-05 12:34:48 | mark.dickinson | set | messages: + msg194472 |
2013-08-05 00:04:45 | ncoghlan | set | messages:
+ msg194441 title: Add itertools.coalesce -> Add itertools.first_true (return first true item in iterable) |
2013-08-04 23:46:34 | tim.peters | set | messages: + msg194439 |
2013-08-04 23:14:49 | ncoghlan | set | messages: + msg194436 |
2013-08-04 21:26:54 | hynek | set | messages: + msg194429 |
2013-08-04 21:25:26 | tim.peters | set | nosy:
+ tim.peters messages: + msg194428 |
2013-08-04 21:08:03 | serhiy.storchaka | set | messages: + msg194425 |
2013-08-04 16:31:00 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg194392 |
2013-08-04 14:23:42 | hynek | set | messages: + msg194380 |
2013-08-04 13:29:41 | serhiy.storchaka | set | messages: + msg194369 |
2013-08-04 12:43:01 | ncoghlan | set | messages: + msg194361 |
2013-08-04 12:39:05 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg194359 |
2013-08-04 12:14:26 | loewis | set | messages: + msg194356 |
2013-08-04 12:09:45 | ncoghlan | set | messages:
+ msg194355 title: Add a “first”-like function to the stdlib -> Add itertools.coalesce |
2013-08-04 11:58:16 | hynek | set | messages:
+ msg194354 title: Add a “first” function to the stdlib -> Add a “first”-like function to the stdlib |
2013-08-04 11:50:16 | ncoghlan | set | messages: + msg194353 |
2013-08-04 11:49:38 | loewis | set | messages: + msg194352 |
2013-08-04 11:41:34 | hynek | set | messages: + msg194351 |
2013-08-04 11:38:45 | ncoghlan | set | messages: + msg194350 |
2013-08-04 11:38:39 | loewis | set | nosy:
+ loewis messages: + msg194349 |
2013-08-04 11:36:23 | hynek | set | messages: + msg194348 |
2013-08-04 11:27:24 | serhiy.storchaka | set | messages: + msg194346 |
2013-08-04 10:58:38 | ncoghlan | set | messages: + msg194343 |
2013-08-04 10:28:59 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg194341 |
2013-08-04 10:08:54 | hynek | create |