Title: multiprocessing's default start method of fork()-without-exec() is broken
Type: behavior Stage:
Components: Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Julian, aduncan, davin, itamarst, itamarst2, pitrou
Priority: normal Keywords:

Created on 2020-04-24 18:22 by itamarst, last changed 2021-04-30 19:31 by pitrou.

Messages (10)
msg367210 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2020-04-24 18:22
By default, multiprocessing uses fork() without exec() on POSIX. For a variety of reasons this can lead to inconsistent state in subprocesses: module-level globals are copied, which can mess up logging, threads don't survive fork(), etc..

The end results vary, but quite often are silent lockups.

In real world usage, this results in users getting mysterious hangs they do not have the knowledge to debug.

The fix for these people is to use "spawn" by default, which is the default on Windows.

Just a small sample:

1. Today I talked to a scientist who spent two weeks stuck, until she found my article on the subject ( Basically multiprocessing locked up, doing nothing forever. Switching to "spawn" fixed it.
2. is someone who had issues fixed by "spawn".
3. is a NumPy issue which apparently impacted scikit-learn.

I suggest changing the default on POSIX to match Windows.
msg367211 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2020-04-24 18:31
Looks like as of 3.8 this only impacts Linux/non-macOS-POSIX, so I'll amend the above to say this will also make it consistent with macOS.
msg368173 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2020-05-05 15:35
Just got an email from someone for whom switching to "spawn" fixed a problem. Earlier this week someone tweeted about this fixing things. This keeps hitting people in the real world.
msg380478 - (view) Author: Itamar Turner-Trauring (itamarst2) Date: 2020-11-06 22:02
Another person with the same issue:
msg392358 - (view) Author: Andrew Duncan (aduncan) Date: 2021-04-29 23:10
I just ran into and fixed (thanks to itamarst's blog post) a problem likely related to this.

Multiprocessing workers performing work and sending a logging message back with success/fail info. I had a few intermittent deadlocks that became a recurring problem when I sped up the process that skipped tasks which had previously completed (I think this shortened the time between forking and attempting to send messages causing the third process to deadlock). After changing that it deadlocked *every time*.

Switching to "spawn" at the top of the main function has fixed it.
msg392501 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-04-30 18:54
The problem with changing the default is that this will break any application that depends on passing non-picklable data to the child process (in addition to the potentially unexpected performance impact).

The docs already contain a significant elaboration on the matter, but feel free to submit a PR that would make the various caveats more explicit:
msg392503 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2021-04-30 18:59
This change was made on macOS at some point, so why not Linux? "spawn" is already the default on macOS and Windows.
msg392506 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-04-30 19:15
The macOS change was required before "fork" simply ceased to work.
Windows has always used "spawn", because no other method can be implemented on Windows.
msg392507 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2021-04-30 19:27
Given people's general experience, I would not say that "fork" works on Linux either. More like "99% of the time it works, 1% it randomly breaks in mysterious way".
msg392508 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-04-30 19:31
Agreed, but again, changing will break some applications.

We could switch to forkserver, but we should have a transition period where a FutureWarning will be displayed if people didn't explicitly set a start method.
Date User Action Args
2021-04-30 19:31:24pitrousetmessages: + msg392508
2021-04-30 19:27:07itamarstsetmessages: + msg392507
2021-04-30 19:15:18pitrousetmessages: + msg392506
2021-04-30 18:59:59itamarstsetmessages: + msg392503
2021-04-30 18:54:13pitrousetmessages: + msg392501
2021-04-29 23:10:59aduncansetnosy: + aduncan
messages: + msg392358
2020-11-06 22:02:52itamarst2setnosy: + itamarst2
messages: + msg380478
2020-06-24 22:12:37Juliansetnosy: + Julian
2020-05-06 01:33:18ned.deilysetnosy: + pitrou, davin
2020-05-05 15:35:22itamarstsetmessages: + msg368173
2020-04-24 18:31:05itamarstsetmessages: + msg367211
2020-04-24 18:22:23itamarstcreate