This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add `Executor.filter` to concurrent.futures
Type: enhancement Stage:
Components: Documentation Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: bquinlan, cool-RR, docs@python, jnoller, ncoghlan, paul.moore, pitrou, rhettinger, sbt
Priority: normal Keywords: patch

Created on 2015-05-14 21:26 by cool-RR, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
1.patch cool-RR, 2015-05-14 21:26 review
issue24195.stoneleaf.01.patch ethan.furman, 2015-05-15 02:24 review
2.patch cool-RR, 2015-05-15 07:36
filter_example.py cool-RR, 2015-05-18 07:30
Messages (32)
msg243217 - (view) Author: Ram Rachum (cool-RR) * Date: 2015-05-14 21:26
`Executor.filter` is to `filter` what `Executor.map` is to `map`.

See Python-ideas thread:

https://groups.google.com/forum/#!topic/python-ideas/EBOC5YCWPyo

Patch attached. I don't know how to run the Python test suite (I'm guessing it involves building Python and I don't know how to do that.) Please tell me whether the tests I wrote pass.

I'm guessing that other than that, the main thing missing in my patch is documentation. If there's agreement that this feature is desirable and the implementation is good, I'll be happy to write the docs. Let me know.
msg243218 - (view) Author: Ram Rachum (cool-RR) * Date: 2015-05-14 21:28
Also, I notice Python 3.5 feature freeze is a bit over a week away, and I hope we can get that in so it could go in Python 3.5. (Assuming it goes in at all.)
msg243221 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2015-05-14 22:03
You should add docs for the new method, as well.
msg243240 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-05-15 02:24
Updated the tests (had to use real defs, not lambdas, and the expected results for filter_exception weren't right).

Tests pass.

Get some docs written!  :)

(More reviews would also be good. ;)
msg243242 - (view) Author: Brian Quinlan (bquinlan) * (Python committer) Date: 2015-05-15 03:29
This feature seems unnecessary to me but couldn't the implementation be simplified to work in terms of map? i.e. (pseudocode):

def filter(self, fn, iterable, timeout=None):
  l = list(iterable)
  return (item for (item, keep) in zip(l, self.map(fn, l, timeout)) if keep)
msg243254 - (view) Author: Ram Rachum (cool-RR) * Date: 2015-05-15 07:36
Patch with documentation attached. (I don't know how to concatenate patches, so 2.patch contains only the documentation, while 1.patch has the implementation and the tests (but Ethan's patch is better.))

Brian, regarding your simpler implementation based on `Executor.map`: It looks good to me, but I'm not sure if it passes the tests. If it does then I don't mind if that's the implementation that would be used.
msg243256 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2015-05-15 08:21
Just as a note - to test a pure Pthon patch like this, you can apply the patch to your installed Python 3.4 installation, and just run the test using that. There should be no need to build your own Python.

>python C:\Apps\Python34\Lib\test\test_concurrent_futures.py
test_cancel (__main__.FutureTests) ... ok
test_cancelled (__main__.FutureTests) ... ok
test_done (__main__.FutureTests) ... ok
test_done_callback_already_cancelled (__main__.FutureTests) ... ok
test_done_callback_already_failed (__main__.FutureTests) ... ok
test_done_callback_already_successful (__main__.FutureTests) ... ok
[... etc]
msg243260 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-15 09:24
This feature looks unnecessary to me as well. Adding features has a non-zero cost in maintenance.
msg243272 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-05-15 15:59
Short History:
=============

(Ram Rachum)
What do you think about adding a method: `Executor.filter`?
I was using something like this: 

    my_things = [thing for thing in things if some_condition(thing)]

But the problem was that `some_condition` took a long time to run waiting on I/O, which is a great candidate for parallelizing with ThreadPoolExecutor. I made it work using `Executor.map` and some improvizing, but it would be nicer if I could do:

    with concurrent.futures.ThreadPoolExecutor(100) as executor:
        my_things = executor.filter(some_condition, things)

And have the condition run in parallel on all the threads.

(Nick Coughlan)
I think this is sufficiently tricky to get right that it's worth adding filter() as a parallel to the existing map() API.
msg243274 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-15 16:30
> (Nick Coughlan)

Nick Coghlan isn't currently on this issue. Unless you were talking about another, separate Nick Coughlan that I don't know of? :)
msg243276 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-15 16:33
But to answer your argument:

> I think this is sufficiently tricky to get right that it's worth
> adding filter() as a parallel to the existing map() API.

"Tricky to get right" is not a good criterion. The question is whether it's useful or not. Only the OP has had that need AFAIK.
msg243281 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-05-15 16:55
That comment was from an email by Nick, not to Nick.  But now I've added him.  ;)
msg243296 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-05-16 06:13
filter() usage has always been less common than map() usage, and we see a similar pattern in comprehension usage as well (i.e. [f(x) for x in y] is a far more common construct than [x for x in p(y)]). That "less common" status doesn't keep us from providing filter() as builtin, or syntactic support for filtering in the comprehension syntax.

As a result, the main question I'd like to see a clear and authoritative answer to is "Given 'seq2 = filter(p, seq)' or 'seq2 = [x for seq if p(x)]', what's the concurrent.futures based parallel execution syntax in cases where the filtering key is expensive to calculate?"

I'd be quite OK with Brian's 2-line implementation going into the concurrent.futures documentation as a filtering recipe, similar to the way Raymond uses the recipes in the itertools documentation to help minimise complexity growth in the core API.

I *don't* mind if Brian's judgement is that it doesn't rise to the level of being worth including as a core feature in its own right, as I agree that the typical case of filtering functions is that they're fast, and when they're not, it's often a sign that data model denormalisation may be desirable in order to cache the relevant derived property.
msg243298 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-05-16 07:58
Folks could probably guess what I meant, but both filter comprehensions in my comment should have read: "[x for x in seq if p(x)]"
msg243388 - (view) Author: Ram Rachum (cool-RR) * Date: 2015-05-17 07:52
Okay, let me understand what the opinions of people are about this feature.

Ram Rachum: +1
Antoine Pitrou: -1
Nick Coghlan: Is that a +1 or a +0?
Ethan and Paul, you've participated but I didn't get your opinion on whether you want to see this feature go in. 

If there are more -1s than +1s we can close this ticket.
msg243390 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2015-05-17 08:02
I'm not a heavy user of concurrent.futures, so I don't have a strong opinion. I've never felt a need for this function, though, so I guess I'm -0. A recipe in the docs would be good, though.
msg243436 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-05-17 23:48
I'm now +1 on a recipe in the docs, -0 on a concrete method on Executor
that delegates to the map() method.
msg243463 - (view) Author: Ram Rachum (cool-RR) * Date: 2015-05-18 07:30
Looks like this a recipe in the docs is where this ticket is headed.

I took my original example for `Executor.filter` and changed it using Brian's suggestion so now it uses `Executor.map`. Please check it out. If someone other than me feels comfortable in putting it into the documentation page, I'll be happy if you could do that.

Possible issues:

This example uses `requests`, which isn't in the standard library.

I would say that this `for` line:

        for person in (person for person, filter_result in
                       zip(people, executor.map(has_wikipedia_page, people))
                       if filter_result):

Is a bit ugly, but since the consensus here is that this is better than implementing `Executor.filter`, so be it.
msg243493 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-05-18 16:18
I'd rather see an `Executor.filter()` myself, but it's Brian Quinlan's call.
msg243871 - (view) Author: Brian Quinlan (bquinlan) * (Python committer) Date: 2015-05-23 00:03
Ethan: I'm -0 so I'd happily go with the consensus. Does anyone have a strong opinion?

Ram: What did you mean by "Possible issues"? Did you mean that to be an example using the executor.map() technique?
msg243886 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2015-05-23 05:03
I vary between +0 and -0 for the addition of the concrete method.

When I'm at +0, the main rationale is that we *don't* have the "Where do we stop?" risk here that itertools faces, as we're just replicating the synchronous builtins.

When I'm at -0, the main rationale is that a recipe works with *any* version of Python that provides concurrent.futures (including any version of the PyPI backport), and is hence useful immediately, while a method would only work the version where we added it.
msg243888 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-05-23 06:06
It looks like we are pretty much neutral between the +1's and -1's.

Antoine seems to be opposed on general principles against bloat, while I am for it on general principles of completeness.

The recipe could still go in the docs for people to use on previous versions of Python.

If I was a module maintainer, and the other maintainers were not opposed, I would add it.

(So, Brian, care to have a co-maintainer? ;)
msg243890 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-05-23 06:53
I would like to tip the balance in favor of the -1.  Antoine is right about this one.
msg243891 - (view) Author: Ram Rachum (cool-RR) * Date: 2015-05-23 07:25
Raymond: Thank you. So the discussion is back on adding a recipe to the docs.

Brian: When I said "possible issues", I followed that with a couple of issues with the example I uploaded (filter_example.py).
msg244914 - (view) Author: Ram Rachum (cool-RR) * Date: 2015-06-06 15:30
A problem I just realized with Brian's 2-line implementation of `filter`: It doesn't work for iterables which aren't sequences, since it attempts to exhaust the iterable twice. So if you have a non-sequence you'll have to make it into a sequence first.
msg245004 - (view) Author: Brian Quinlan (bquinlan) * (Python committer) Date: 2015-06-08 14:31
Actually it immediately converts the iterable into a list. Recall:

def filter(self, fn, iterable, timeout=None):
  l = list(iterable)  # iterable => list
  return (item for (item, keep) in zip(l, self.map(fn, l, timeout)) if keep)
msg245016 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-06-08 16:39
The recipe will work in simple situations.

In more complex situations:

  - infinite iterator
  - really large iterator (millions of items)
  - high work load to produce next iteration

the recipe will either have a severe impact on performance or fail entirely.
msg245017 - (view) Author: Ram Rachum (cool-RR) * Date: 2015-06-08 16:44
Ethan: That's what I also thought, but then I figured that you're creating N futures anyway with an item attached to each, so it's on the same scale as creating a list, no?
msg245025 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-06-08 17:10
The recipe creates a list before it ever starts processing, while Executor.filter() starts processing with the first item.
msg245026 - (view) Author: Ram Rachum (cool-RR) * Date: 2015-06-08 17:11
Right, so it might be garbage-collecting the items before the futures are done being created, so the whole list wouldn't need to be saved in memory. I understand now.
msg246029 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-07-01 07:25
Brian, given my comments in msg245016 are you willing to add the Executor.filter() function?
msg341635 - (view) Author: Brian Quinlan (bquinlan) * (Python committer) Date: 2019-05-06 20:06
Hey Ethan, I'm really sorry about dropping the ball on this. I've been burnt out on Python stuff for the last couple of years.

When we left this, it looked like the -1s were in the majority and no one new has jumped on to support `filter`.

If you wanted to add this, I wouldn't object. But I've been inactive so long that I don't think that I should make the decision.
History
Date User Action Args
2022-04-11 14:58:16adminsetgithub: 68383
2019-05-06 20:06:01bquinlansetmessages: + msg341635
2015-07-21 07:07:46ethan.furmansetnosy: - ethan.furman
2015-07-01 07:25:40ethan.furmansetmessages: + msg246029
versions: + Python 3.6, - Python 3.5
2015-06-08 17:11:33cool-RRsetmessages: + msg245026
2015-06-08 17:10:15ethan.furmansetmessages: + msg245025
2015-06-08 16:44:05cool-RRsetmessages: + msg245017
2015-06-08 16:39:08ethan.furmansetmessages: + msg245016
2015-06-08 14:31:53bquinlansetmessages: + msg245004
2015-06-06 15:30:54cool-RRsetmessages: + msg244914
2015-05-23 07:37:47rhettingersetassignee: docs@python

components: + Documentation, - Library (Lib)
nosy: + docs@python
2015-05-23 07:25:14cool-RRsetmessages: + msg243891
2015-05-23 06:53:18rhettingersetnosy: + rhettinger
messages: + msg243890
2015-05-23 06:06:26ethan.furmansetmessages: + msg243888
2015-05-23 05:03:28ncoghlansetmessages: + msg243886
2015-05-23 00:03:00bquinlansetmessages: + msg243871
2015-05-18 16:18:28ethan.furmansetmessages: + msg243493
2015-05-18 07:30:58cool-RRsetfiles: + filter_example.py

messages: + msg243463
2015-05-17 23:48:53ncoghlansetmessages: + msg243436
2015-05-17 08:02:56paul.mooresetmessages: + msg243390
2015-05-17 07:52:27cool-RRsetmessages: + msg243388
2015-05-16 07:58:55ncoghlansetmessages: + msg243298
2015-05-16 06:13:36ncoghlansetmessages: + msg243296
2015-05-15 16:55:10ethan.furmansetnosy: + ncoghlan
messages: + msg243281
2015-05-15 16:33:51pitrousetmessages: + msg243276
2015-05-15 16:30:51pitrousetmessages: + msg243274
2015-05-15 15:59:36ethan.furmansetmessages: + msg243272
2015-05-15 09:24:16pitrousetmessages: + msg243260
2015-05-15 08:21:59paul.mooresetmessages: + msg243256
2015-05-15 07:36:59cool-RRsetfiles: + 2.patch

messages: + msg243254
2015-05-15 03:29:19bquinlansetmessages: + msg243242
2015-05-15 03:08:16ethan.furmansetnosy: + bquinlan, jnoller, sbt

title: Add `Executor.filter` -> Add `Executor.filter` to concurrent.futures
2015-05-15 02:24:54ethan.furmansetfiles: + issue24195.stoneleaf.01.patch

messages: + msg243240
2015-05-15 00:46:52ethan.furmansetnosy: + ethan.furman
2015-05-14 22:42:52rhettingersetnosy: + pitrou
2015-05-14 22:03:22paul.mooresetnosy: + paul.moore
messages: + msg243221
2015-05-14 21:28:15cool-RRsetmessages: + msg243218
2015-05-14 21:26:41cool-RRcreate