classification
Title: Pool.imap doesn't work as advertised
Type: behavior Stage:
Components: Library (Lib) Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: alex-garel, davin, jneb, josh.r, pitrou, tim.peters
Priority: normal Keywords:

Created on 2013-12-16 08:31 by jneb, last changed 2018-04-12 22:19 by josh.r.

Files
File name Uploaded Description Edit
mypool.py jneb, 2013-12-16 08:31 Concept implementation of pool.imap_unordered
imu.py tim.peters, 2013-12-16 19:41
Messages (5)
msg206279 - (view) Author: Jurjen N.E. Bos (jneb) * Date: 2013-12-16 08:31
The pool.imap and pool.imap_unordered functions are documented as "a lazy version of Pool.map".
In fact, they aren't: they consume the iterator argument as a whole. This is almost certainly not what the user wants: it uses unnecessary memory and will be slower than expected if the output iterator isn't consumed in full. In fact, there isn't much use at all of imap over map at the moment.
I tried to fixed the code myself, but due to the two-level queueing of the input arguments this is not trivial.
Stackoverflow's Blckknght wrote a simplified solution that gives the idea how it should work.
Since that wasn't posted here, I thought it would be useful to put it here, even if only for documentation purposes.
msg206334 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-12-16 17:25
Nice to see you, Jurjen!  Been a long time :-)

I'd like to see changes here too.  It's unclear what "a lazy version"  is intended to mean, exactly, but I agree the actual behavior is surprising, and that mpool.py is a lot less surprising in several ways.

I got bitten by this just last week, when running a parallelized search over a massive space _expected_ to succeed after exploring a tiny fraction of the search space.  Ran out of system resources because imap_unordered() tried to queue up countless millions of work descriptions.  I had hoped/expected that it would interleave generating and queue'ing "a few" inputs with retrieving outputs, much as mpool.py behaves.

In that case I switched to using apply_async() instead, interposing my own bounded queue (a collections.deque used only in the main program) to throttle the main program.  I'm still surprised it was necessary ;-)
msg206356 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-12-16 19:41
Just for interest, I'll attach the worm-around I mentioned (imu.py).  At this level it's a very simple implementation, but now that I look at it, it's actually a lazy implementation of imap() (or of an unimaginative ;-) imap_unordered()).
msg315231 - (view) Author: Alex Garel (alex-garel) Date: 2018-04-12 16:53
Hello, I think this is a really important feature, it hits me hard those days. 

It would also solve https://bugs.python.org/issue19173 in a nice way.
msg315235 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2018-04-12 22:19
Related: issue29842 "Make Executor.map work with infinite/large inputs correctly" for a similar problem in concurrent.futures (but worse, since it doesn't even allow you to begin consuming results until all inputs are dispatched).

A similar approach to my Executor.map patch could probably be used with imap/imap_unordered.
History
Date User Action Args
2018-04-12 22:19:37josh.rsetnosy: + josh.r
messages: + msg315235
2018-04-12 19:34:32ned.deilysetnosy: + pitrou, davin
2018-04-12 16:53:15alex-garelsetnosy: + alex-garel
messages: + msg315231
2013-12-16 19:41:53tim.peterssetfiles: + imu.py

messages: + msg206356
2013-12-16 17:25:37tim.peterssetnosy: + tim.peters
messages: + msg206334
2013-12-16 08:31:16jnebcreate