This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: multiprocessing's default start method of fork()-without-exec() is broken
Type: behavior Stage:
Components: Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Julian, aduncan, davin, itamarst, itamarst2, mgorny, pitrou, wim.glenn
Priority: normal Keywords:

Created on 2020-04-24 18:22 by itamarst, last changed 2022-04-11 14:59 by admin.

Messages (11)
msg367210 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2020-04-24 18:22
By default, multiprocessing uses fork() without exec() on POSIX. For a variety of reasons this can lead to inconsistent state in subprocesses: module-level globals are copied, which can mess up logging, threads don't survive fork(), etc..

The end results vary, but quite often are silent lockups.

In real world usage, this results in users getting mysterious hangs they do not have the knowledge to debug.

The fix for these people is to use "spawn" by default, which is the default on Windows.

Just a small sample:

1. Today I talked to a scientist who spent two weeks stuck, until she found my article on the subject (https://codewithoutrules.com/2018/09/04/python-multiprocessing/). Basically multiprocessing locked up, doing nothing forever. Switching to "spawn" fixed it.
2. https://github.com/dask/dask/issues/3759#issuecomment-476743555 is someone who had issues fixed by "spawn".
3. https://github.com/numpy/numpy/issues/15973 is a NumPy issue which apparently impacted scikit-learn.


I suggest changing the default on POSIX to match Windows.
msg367211 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2020-04-24 18:31
Looks like as of 3.8 this only impacts Linux/non-macOS-POSIX, so I'll amend the above to say this will also make it consistent with macOS.
msg368173 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2020-05-05 15:35
Just got an email from someone for whom switching to "spawn" fixed a problem. Earlier this week someone tweeted about this fixing things. This keeps hitting people in the real world.
msg380478 - (view) Author: Itamar Turner-Trauring (itamarst2) Date: 2020-11-06 22:02
Another person with the same issue: https://twitter.com/volcan01010/status/1324764531139248128
msg392358 - (view) Author: Andrew Duncan (aduncan) Date: 2021-04-29 23:10
I just ran into and fixed (thanks to itamarst's blog post) a problem likely related to this.

Multiprocessing workers performing work and sending a logging message back with success/fail info. I had a few intermittent deadlocks that became a recurring problem when I sped up the process that skipped tasks which had previously completed (I think this shortened the time between forking and attempting to send messages causing the third process to deadlock). After changing that it deadlocked *every time*.

Switching to "spawn" at the top of the main function has fixed it.
msg392501 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-04-30 18:54
The problem with changing the default is that this will break any application that depends on passing non-picklable data to the child process (in addition to the potentially unexpected performance impact).

The docs already contain a significant elaboration on the matter, but feel free to submit a PR that would make the various caveats more explicit:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
msg392503 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2021-04-30 18:59
This change was made on macOS at some point, so why not Linux? "spawn" is already the default on macOS and Windows.
msg392506 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-04-30 19:15
The macOS change was required before "fork" simply ceased to work.
Windows has always used "spawn", because no other method can be implemented on Windows.
msg392507 - (view) Author: Itamar Turner-Trauring (itamarst) Date: 2021-04-30 19:27
Given people's general experience, I would not say that "fork" works on Linux either. More like "99% of the time it works, 1% it randomly breaks in mysterious way".
msg392508 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-04-30 19:31
Agreed, but again, changing will break some applications.

We could switch to forkserver, but we should have a transition period where a FutureWarning will be displayed if people didn't explicitly set a start method.
msg413081 - (view) Author: Michał Górny (mgorny) * Date: 2022-02-11 16:13
After updating PyPy3 to use Python 3.9's stdlib, we hit very bad hangs because of this — literally compiling a single file with "parallel" compileall could hang.  In the end, we had to revert the change in how Python 3.9 starts workers because otherwise multiprocessing would be impossible to use:

https://foss.heptapod.net/pypy/pypy/-/commit/c594b6c48a48386e8ac1f3f52d4b82f9c3e34784

This is a very bad default and what's even worse is that it often causes deadlocks that are hard to reproduce or debug.  Furthermore, since "fork" is the default, people are unintentionally relying on its support for passing non-pickleable projects and are creating non-portable code.  The code often becomes complex and hard to change before they discover the problem.

Before we managed to figure out how to workaround the deadlocks in PyPy3, we were experimenting with switching the default to "spawn".  Unfortunately, we've hit multiple projects that didn't work with this method, precisely because of pickling problems.  Furthermore, they were surprised to learn that their code wouldn't work on macOS (in the end, many people perceive Python as a language for writing portable software).

Finally, back in 2018 I've made one of my projects do parallel work using multiprocessing.  It gave its users great speedup but for some it caused deadlocks that I couldn't reproduce nor debug.  In the end, I had to revert it.  Now that I've learned about this problem, I'm wondering if this wasn't precisely because of "fork" method.
History
Date User Action Args
2022-04-11 14:59:29adminsetgithub: 84559
2022-02-11 16:13:53mgornysetnosy: + mgorny
messages: + msg413081
2021-10-18 18:25:04wim.glennsetnosy: + wim.glenn
2021-04-30 19:31:24pitrousetmessages: + msg392508
2021-04-30 19:27:07itamarstsetmessages: + msg392507
2021-04-30 19:15:18pitrousetmessages: + msg392506
2021-04-30 18:59:59itamarstsetmessages: + msg392503
2021-04-30 18:54:13pitrousetmessages: + msg392501
2021-04-29 23:10:59aduncansetnosy: + aduncan
messages: + msg392358
2020-11-06 22:02:52itamarst2setnosy: + itamarst2
messages: + msg380478
2020-06-24 22:12:37Juliansetnosy: + Julian
2020-05-06 01:33:18ned.deilysetnosy: + pitrou, davin
2020-05-05 15:35:22itamarstsetmessages: + msg368173
2020-04-24 18:31:05itamarstsetmessages: + msg367211
2020-04-24 18:22:23itamarstcreate