This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add filter to multiprocessing.Pool
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Mike.Drob, christian.leichsenring, davin, sbt, travis.thieman
Priority: normal Keywords:

Created on 2014-11-13 17:35 by Mike.Drob, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg231124 - (view) Author: Mike Drob (Mike.Drob) Date: 2014-11-13 17:35
Being able to use a pool to easily run 'map' over an iterable is very powerful, but it would also be nice to run 'filter' (or potentially 'ifilter' or 'filter_async', in keeping with the patterns already present).
msg231216 - (view) Author: Travis Thieman (travis.thieman) * Date: 2014-11-15 20:28
Why is it insufficient to run a synchronous 'filter' over the list returned by 'Pool.map'? These functional constructs are inherently composable, and we should favor composing simple implementations of each rather than implementing special cases of them throughout the stdlib.

I think there's a clear reason for 'map' to be parallelizable because the function you're applying over the iterable could be quite expensive. 'filter' would only benefit from this if the comparison you're running is expensive, which seems like an unlikely and ill-advised use case. You can also rewrite your expensive 'filter' as a 'map' if you really need to.
msg235687 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-02-10 14:48
The points made by Travis are clear and solid.

Closing as this functionality is already handled well and no exceptional situations are being argued for that would require a special case.
msg377746 - (view) Author: Christian Leichsenring (christian.leichsenring) Date: 2020-10-01 12:06
The main point the OP didn't make is exactly the issue that Pool.map returns a list which is potentially very large given that multiprocessing is used to process large amounts of data.

So IMHO either there should be the possibility to exclude elements from being saved in memory (i.e. Pool.filter) or Pool.map shouldn't return a list but just an iterable.
History
Date User Action Args
2022-04-11 14:58:10adminsetgithub: 67053
2020-10-01 12:06:16christian.leichsenringsetnosy: + christian.leichsenring
messages: + msg377746
2015-02-10 14:48:47davinsetstatus: open -> closed

nosy: + davin
messages: + msg235687

resolution: rejected
stage: resolved
2014-11-15 20:28:36travis.thiemansetnosy: + travis.thieman
messages: + msg231216
2014-11-13 23:03:46ned.deilysetnosy: + sbt
2014-11-13 17:35:27Mike.Drobcreate