Message 341667 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	josh.r
Recipients	bquinlan, josh.r, ncoghlan
Date	2019-05-07.01:23:19
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1557192199.55.0.405491406075.issue23697@roundup.psfhosted.org>
In-reply-to

Content
For the process based versions, it makes it too easy to accidentally fork bomb yourself, since each process that call psubmit would implicitly created another #CPUs workers of its own, so a process based version Brian's case with a mere four top-level psubmits each of which performs a single psubmit of its own would logically involve 1 + #CPUs + #CPU**2 total processes, without the user ever explicitly asking for them. Especially in a fork-based context, this could easily trigger the Linux OOM-killer if the parent process is of even moderate size, since a four core system with a 1 GB parent process would suddenly be asking for up to 21 GB of memory. Most of that is only potentially used, given COW behavior, but the OOM killer assumes COW memory will eventually be copied (and it's largely right about that for CPython given the cyclic garbage collector's twiddling of reference counts), so it's hardly relevant if 21 GB is actually used; the OOM-killer doesn't care, and will murder the process anyway. The alternative would be having the default process executor shared with the child processes, but that just means process usage would be subject to the same deadlocks as in Brian's threaded case. This also defeats the purpose of the Executor model; Java, which pioneered it to my knowledge, intentionally required you to create the executor up front (typically in a single global location) because the goal is to allow you to change your program-wide parallelism model by changing a single line (the definition of the executor), with all uses of the executor remaining unchanged. Making a bunch of global functions implicitly tied to different executors/executor models means the parallelism is no longer centrally defined, so switching models means changes all over the code base (in Python, that's often unavoidable due to constraints involving pickling and data sharing, but there is no need to make it worse).

For the process based versions, it makes it too easy to accidentally fork bomb yourself, since each process that call psubmit would implicitly created another #CPUs workers of its own, so a process based version Brian's case with a mere four top-level psubmits each of which performs a single psubmit of its own would logically involve 1 + #CPUs + #CPU**2 total processes, without the user ever explicitly asking for them.

Especially in a fork-based context, this could easily trigger the Linux OOM-killer if the parent process is of even moderate size, since a four core system with a 1 GB parent process would suddenly be asking for up to 21 GB of memory. Most of that is only potentially used, given COW behavior, but the OOM killer assumes COW memory will eventually be copied (and it's largely right about that for CPython given the cyclic garbage collector's twiddling of reference counts), so it's hardly relevant if 21 GB is actually used; the OOM-killer doesn't care, and will murder the process anyway.

The alternative would be having the default process executor shared with the child processes, but that just means process usage would be subject to the same deadlocks as in Brian's threaded case.

This also defeats the purpose of the Executor model; Java, which pioneered it to my knowledge, intentionally required you to create the executor up front (typically in a single global location) because the goal is to allow you to change your program-wide parallelism model by changing a single line (the definition of the executor), with all uses of the executor remaining unchanged. Making a bunch of global functions implicitly tied to different executors/executor models means the parallelism is no longer centrally defined, so switching models means changes all over the code base (in Python, that's often unavoidable due to constraints involving pickling and data sharing, but there is no need to make it worse).

History
Date	User	Action	Args
2019-05-07 01:23:19	josh.r	set	recipients: + josh.r, bquinlan, ncoghlan
2019-05-07 01:23:19	josh.r	set	messageid: <1557192199.55.0.405491406075.issue23697@roundup.psfhosted.org>
2019-05-07 01:23:19	josh.r	link	issue23697 messages
2019-05-07 01:23:19	josh.r	create