Message 365295 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kousu
Recipients	kousu
Date	2020-03-30.04:06:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1585541187.46.0.606629490552.issue40110@roundup.psfhosted.org>
In-reply-to

Content
multiprocessing.Pool.imap() is supposed to be a lazy version of map. But it's not: it submits work to its workers eagerly. As a consequence, in a pipeline, all the work from earlier steps is queued, performed, and finished first, before starting later steps. If you use python3's built-in map() -- aka the old itertools.imap() -- the operations are on-demand, so it surprised me that Pool.imap() doesn't. It's basically no better than using Pool.map(). Maybe it saves memory by not materializing large iterables in every worker process? But it still materializes the CPU time from the iterables even if unneeded. This can be partially worked around by giving each step of the pipeline its own Pool -- then, at least the earlier steps of the pipeline don't block the later steps -- but the jobs are still done eagerly instead of waiting for their results to actually be requested.

multiprocessing.Pool.imap() is supposed to be a lazy version of map. But it's not: it submits work to its workers eagerly. As a consequence, in a pipeline, all the work from earlier steps is queued, performed, and finished first, before starting later steps.

If you use python3's built-in map() -- aka the old itertools.imap() -- the operations are on-demand, so it surprised me that Pool.imap() doesn't. It's basically no better than using Pool.map(). Maybe it saves memory by not materializing large iterables in every worker process? But it still materializes the CPU time from the iterables even if unneeded.

This can be partially worked around by giving each step of the pipeline its own Pool -- then, at least the earlier steps of the pipeline don't block the later steps -- but the jobs are still done eagerly instead of waiting for their results to actually be requested.

History
Date	User	Action	Args
2020-03-30 04:06:27	kousu	set	recipients: + kousu
2020-03-30 04:06:27	kousu	set	messageid: <1585541187.46.0.606629490552.issue40110@roundup.psfhosted.org>
2020-03-30 04:06:27	kousu	link	issue40110 messages
2020-03-30 04:06:27	kousu	create