classification
Title: Python crashes on macOS after fork with no exec
Type: Stage: patch review
Components: macOS Versions: Python 3.8, Python 3.7, Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, davin, kapilt, miss-islington, ned.deily, pitrou, ronaldoussoren, vstinner
Priority: normal Keywords: patch

Created on 2018-06-01 00:53 by kapilt, last changed 2018-12-13 02:24 by barry.

Pull Requests
URL Status Linked Edit
PR 11043 merged ned.deily, 2018-12-09 06:30
PR 11044 merged miss-islington, 2018-12-09 06:50
PR 11045 merged miss-islington, 2018-12-09 06:50
Messages (28)
msg318352 - (view) Author: Kapil Thangavelu (kapilt) Date: 2018-06-01 00:53
This issue seems to be reported a few times on various githubs projects. I've also reproduced using a brew install of python 2.7.15. I haven't been able to reproduce with python 3.6. Note this requires a framework build of python.

Background on the underlying issue cause due to a change in high Sierra 
http://sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html
A ruby perspective on the same issue exhibiting for some apps
https://blog.phusion.nl/2017/10/13/why-ruby-app-servers-break-on-macos-high-sierra-and-what-can-be-done-about-it/


The work around seems to be setting an environment variable OBJC_DISABLE_INITIALIZE_FORK_SAFETY prior to executing python.

Other reports

https://bugs.python.org/issue30837
https://github.com/ansible/ansible/issues/32499
https://github.com/imWildCat/scylla/issues/22
https://github.com/elastic/beats-tester/pull/73
https://github.com/jhaals/ansible-vault/issues/60
msg318361 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-06-01 05:51
A better solution is to avoid using fork mode for multiprocessing. The spawn and fork server modes should work fine. 

The underlying problem is that macOS system frameworks (basically anything higher level than libc) are not save wrt fork(2) and fixing that appears to have no priority at all at Apple.
msg318396 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-06-01 10:57
(As a side note, the macOS Pythons provided by python.org installers should not behave differently on macOS 10.13 High Sierra since none of them are built with a 10.13 SDK.)
msg318397 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-06-01 11:10
I understand that Apple, with their limited resources, cannot spend expensive engineer manpower on improving POSIX support in macOS </snark>.

In any case, I'm unsure this bug can be fixed at the Python level.  If macOS APIs don't like fork(), they don't like fork(), point bar.  As Ronald says, on 3.x you should use "forkserver" (for multiple reasons, not only this issue).  On 2.7 you're stuck dealing with the issue by yourself.
msg318528 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-06-03 08:47
Antoine, the issue is not necessarily related to POSIX compliance, AFAIK strictly POSIX compliant code should work just fine. The problem is in higher-level APIs (CoreFoundation, Foundation, AppKit, ...), and appears to be related to using multi-threading in those libraries without spending effort on pre/post fork handlers to ensure that new processes are in a sane state after fork().  In older macOS versions this could result in hard to debug issues, in newer versions APIs seem to guard against this by aborting when the detect that the pid changed.

Anyways... I agree that we shouldn't try to work around this in CPython, there's bound to more problems that are hidden with the proposed workaround.

---

<http://www.sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html> describes what the environment variable does, and this "just" changes behavior of the ObjC runtime, and doesn't make using macOS system frameworks after a fork saver.
msg318529 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-06-03 08:51
@Ned: In the long run the macOS installers should be build using the latest SDK, primarily to get full API coverage and access to all system APIs.

AFAIK building using the macOS 10.9 SDK still excludes a number of libSystem APIs that would be made available through the posix module when building with a newer SDK. 

That's something that would require some effort though to ensure that the resulting binary still works on older versions of macOS (basically similar to the work I've done in the post to weak link some other symbols in the posix module).
msg318708 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-06-04 21:39
{Note: this is not particularly relevant to the issue here.)

Ronald:
> In the long run the macOS installers should be build using the latest SDK [...] That's something that would require some effort though to ensure that the resulting binary still works on older versions of macOS

I agree that being able to build with the latest SDK would be nice but it's also true it would require effort on our part, both one-time and ongoing, at least for every new macOS SDK release and update to test with each older system.  It would also require that the third-party libraries we build for an installer also behave correctly.  And to make full use of it, third-party Python packages with extension modules would also need to behave correctly.  I see one of the primary use cases for the python.org macOS installers as being for Python app developers who want to provide apps that run on a range of macOS releases.  It seems to me that the safest and simplest way to guarantee that python.org macOS Pythons fulfill that need is to continue to always build them on the oldest supported system.  Yes, that means that users may miss out on a few features only supported on the more recent macOS releases but I think that's the right trade-off until we have the resources to truly investigate and decide to support weak linking from current systems.
msg329871 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-11-13 21:50
issue35219 is where I've run into this problem.  I'm still trying to figure out all the details in my own case, but I can confirm that setting the environment variable does not always help.
msg329880 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-11-14 01:14
Hoo boy.  I'm not sure I have the full picture, but things are starting to come into focus.  After much debugging, I've narrowed down at least one crash to urllib.request.getproxies().  On macOS (darwin), this ends up calling _scproxy.get_proxies() which calls into the SystemConfiguration framework.  I'll bet dollars to donuts that that calls into the ObjC runtime.  Thus it is unsafe to call between fork and exec.  This certainly seems to be the case even if the environment variable is set.

The problem is that I think requests.post() probably also ends up in here somehow (still untraced), because by removing our call to urllib.requests.getproxies(), we just crash later on when requests.post() is called.

I don't know what, if anything can be done in Python, except perhaps to document that anything that calls into the ObjC runtime between fork and exec can potentially crash the subprocess.
msg329885 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-11-14 01:36
A few other things I don't understand:

* Why does setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES only seem to work when it's set in the shell before the parent process executes?  AFAICT, it does *not* work if you set that in os.environ in the parent process before the os.fork().

* Why does it only crash on the first invocation of our app?  Does getproxies() cache the results somehow?  There's too much internal application code in the way to know if we're doing something that prevents getproxies() from getting called in subsequent calls.

* I can't seem to produce a smaller test case.
msg329919 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-11-14 17:52
FWIW, I suspect that setting the environment variable only helps if it's done before the process starts.  You cannot set it before the fork and have it affect the child.
msg329922 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2018-11-14 18:11
Barry's effort as well as comments in other links seem to all suggest that OBJC_DISABLE_INITIALIZE_FORK_SAFETY is not comprehensive in its ability to make other threads "safe" before forking.

"Objective-C classes defined by the OS frameworks remain fork-unsafe" (from @kapilt's first link) suggests we furthermore remain at risk using certain MacOS system libraries prior to any call to fork.

"To guarantee that forking is safe, the application must not be running any threads at the point of fork" (from @kapilt's second link) is an old truth that we continue to fight with even when we know very well that it's the truth.

For newly developed code, we have the alternative to employ spawn instead of fork to avoid these problems in Python, C, Ruby, etc.  For existing legacy code that employed fork and now surprises us by failing-fast on MacOS 10.13 and 10.14, it seems we are forced to face a technical debt incurred back when the choice was first made to spin up threads and afterwards to use fork.

If we didn't already have an "obvious" (zen of Python) way to avoid such problems with spawn versus fork, I would feel this was something to solve in Python.  As to helping the poor unfortunate souls who must fight the good fight with legacy code, I am not sure what to do to help though I would like to be able to help.
msg329923 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-11-14 18:16
Legacy code is easy to migrate as long as it uses Python 3.  Just call

  mp.set_start_method('forkserver')

at the top of your code and you're done.  Some use cases may fail (if sharing non-picklable types), but they're probably not very common.
msg329926 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-11-14 18:32
_scproxy has been known to be problematic for some time, see for instance Issue31818.  That issue also gives a simple workaround: setting urllib's "no_proxy" environment variable to "*" will prevent the calls to the System Configuration framework.
msg329927 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2018-11-14 19:42
Given the original post mentioned 2.7.15, I wonder if it is feasible to fork near the beginning of execution, then maintain and pass around a multiprocessing.Pool to be used when needed instead of dynamically forking?  Working with legacy code is almost always more interesting than you want it to be.
msg329933 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-11-14 20:43
On Nov 14, 2018, at 10:11, Davin Potts <report@bugs.python.org> wrote:
> 
> 
> Davin Potts <python@discontinuity.net> added the comment:
> 
> Barry's effort as well as comments in other links seem to all suggest that OBJC_DISABLE_INITIALIZE_FORK_SAFETY is not comprehensive in its ability to make other threads "safe" before forking.

Right.  Setting the env var will definitely not make it thread safe.  My understanding (please correct me if I’m wrong!) isn’t that this env var makes it safe, just that it prevents the ObjC runtime from core dumping.  So it’s still up to the developer to know whether threads are involved or not.  In our cases, these are single threaded applications.  I’ve read elsewhere that ObjC doesn’t care if threads have actually been spun up or not.

> "Objective-C classes defined by the OS frameworks remain fork-unsafe" (from @kapilt's first link) suggests we furthermore remain at risk using certain MacOS system libraries prior to any call to fork.

Actually, it’s unsafe to call anything between fork and exec.  Note that this doesn’t just affect Python; this is a pretty common idiom in other scripting languages too, from what I can tell.  It’s certainly very common in Python.

Note too that urllib.request.getproxies() will end up calling into the ObjC runtime via _scproxy, so you can’t even use requests after a fork but before exec.

What I am still experimenting with is to see if I can define a pthread_atfork handler that will initialize the ObjC runtime before fork is actually called.  I saw a Ruby approach like this, but it’s made more difficult in Python because pthread_atfork isn’t exposed to Python.  I’m trying to see if I can implement it in ctypes, before I write an extension.

> "To guarantee that forking is safe, the application must not be running any threads at the point of fork" (from @kapilt's second link) is an old truth that we continue to fight with even when we know very well that it's the truth.

True, but do realize this problem affects you even in single threaded applications.

> For newly developed code, we have the alternative to employ spawn instead of fork to avoid these problems in Python, C, Ruby, etc.  For existing legacy code that employed fork and now surprises us by failing-fast on MacOS 10.13 and 10.14, it seems we are forced to face a technical debt incurred back when the choice was first made to spin up threads and afterwards to use fork.

It’s tech debt you incur even if you don’t spin up threads.  Just fork and do some work in the child before calling exec.  If that work enters the ObjC runtime (as in the getproxies example), your child will coredump,

> If we didn't already have an "obvious" (zen of Python) way to avoid such problems with spawn versus fork, I would feel this was something to solve in Python.  As to helping the poor unfortunate souls who must fight the good fight with legacy code, I am not sure what to do to help though I would like to be able to help.

*If* we can provide a hook to initialize the ObjC runtime in pthread_atfork, I think that’s something we could expose in Python.  Then we can say legacy code can just invoke that, and at least you will avoid the worst outcome.
msg329941 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-11-15 03:22
I have a reliable way to call *something* in the pthread_atfork prepare handler, but I honestly don't know what to call to prevent the crash.

In the Ruby thread, it seemed to say that you could just dlopen /System/Library/Frameworks/Foundation.framework/Foundation but that does not work for me.  Neither does also loading the CoreFoundation and SystemConfiguration frameworks.

If anybody has something that will reliably initialize the runtime, I can post my approach (there are a few subtleties).  Short of that, I think there's nothing that can be done except ensure that exec is called right after fork.
msg331101 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-12-05 11:52
AFAIK there is nothing you can do between after calling fork(2) to "reinitialise" the ObjC runtime. And I don't think that's the issue anyway: I suspect that the actual problem is that Apple's system frameworks use multithreading (in particular libdispatch) and don't have code to ensure a sane state after calling fork. 

In Python 3 there is another workaround to avoid problems using multiprocessing: use multiprocessing.set_start_method() to switch away from the "fork" startup handler to "spawn" or "forkserver" (the latter only when calling set_start_method before calling any code that might call into Apple system frameworks.
msg331406 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-12-09 06:50
New changeset ac218bc5dbfabbd61c76ce8a17de088611e21981 by Ned Deily in branch 'master':
bpo-33725: skip test_multiprocessing_fork on macOS (GH-11043)
https://github.com/python/cpython/commit/ac218bc5dbfabbd61c76ce8a17de088611e21981
msg331407 - (view) Author: miss-islington (miss-islington) Date: 2018-12-09 07:06
New changeset d4bcf13e06d33b8ec66a68db20df34a029e66882 by Miss Islington (bot) in branch '3.7':
bpo-33725: skip test_multiprocessing_fork on macOS (GH-11043)
https://github.com/python/cpython/commit/d4bcf13e06d33b8ec66a68db20df34a029e66882
msg331409 - (view) Author: miss-islington (miss-islington) Date: 2018-12-09 07:11
New changeset df5d884defc8f1a94013ff9beb493f1428bd55b5 by Miss Islington (bot) in branch '3.6':
bpo-33725: skip test_multiprocessing_fork on macOS (GH-11043)
https://github.com/python/cpython/commit/df5d884defc8f1a94013ff9beb493f1428bd55b5
msg331411 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-12-09 07:33
Since it looks like multiprocessing_fork is not going to be fixable for macOS, the main issue remaining is how to help users avoid this trap (literally).  Should we add a check and issues a warning or error at run time?  Or is a doc change sufficient?

In the meantime, I've merged changes to disable running test_multiprocessing_fork which will sometimes (but not always) segfault on 10.14 Mojave.  I should apologize to Barry and others who have run into this.  I did notice the occasional segfault when testing with Mojave just prior to its release but it wasn't always reproducible and I didn't follow up on it.  Now that the change in 10.14 behavior makes this existing problem with fork no exec more obvious, it's clear that the test segfaults are another manifestation of this.
msg331435 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2018-12-09 15:13
Do we really need to disable the running of test_multiprocessing_fork entirely on MacOS?

My understanding so far is that not *all* of the system libraries on the mac are spinning up threads and so we should expect that there are situations where fork alone may be permissible, but of course we don't yet know what those are.  Pragmatically speaking, I have not yet seen a report of test_multiprocessing_fork tests triggering this problem but I would like to see/hear that when it is observed (that's my pitch for leaving the tests enabled).
msg331438 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2018-12-09 15:45
@ned.deily: Apologies, I misread what you wrote -- I would like to see the random segfaults that you were seeing on Mojave if you can still point me to a few.
msg331459 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-12-10 01:42
I think it make sense to disable this test; the only possible modification would be to only disable it for macOS <= 10.13.  AFAIK, that's the first version where core dumps were possible.  (Aside: I also saw these core dumps for a long time on 10.13 and never associated it with fork-without-exec in Python code.  Clearly, Apple has not done a good enough job of advertising this change.)

I think it is useful to help users on macOS avoid these problematic idioms, via documentation and defaults.  I think there's no way to predict when the core dumps will happen.  With internal cases, I've seen repeated invocations of the same code only core dump on the first run of the process, and not subsequent ones, for reasons I do not understand.  There seems to be a lot of mystery here, and without some explicit help from Apple, we're just doing our best to guess.
msg331610 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-11 11:12
Would it be safe to run the multiprocessing tests on recent macOS with the OBJC_DISABLE_INITIALIZE_FORK_SAFETY environment variable set?
msg331733 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-12-13 01:59
> Would it be safe to run the multiprocessing tests on recent macOS with the OBJC_DISABLE_INITIALIZE_FORK_SAFETY environment variable set?

See Ronald's reply above in msg331101. I believe his point is that there is nothing you can do to make this safe. And it's not a new problem with 10.14 or 10.13. What is new is that Apple is trying to more forcefully make you aware of the danger by causing the runtime to try to catch and crash these cases earlier rather than permit them to perhaps silently cause failures later.
msg331735 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-12-13 02:24
On Dec 12, 2018, at 17:59, Ned Deily <report@bugs.python.org> wrote:
> 
> Ned Deily <nad@python.org> added the comment:
> 
>> Would it be safe to run the multiprocessing tests on recent macOS with the OBJC_DISABLE_INITIALIZE_FORK_SAFETY environment variable set?
> 
> See Ronald's reply above in msg331101. I believe his point is that there is nothing you can do to make this safe. And it's not a new problem with 10.14 or 10.13. What is new is that Apple is trying to more forcefully make you aware of the danger by causing the runtime to try to catch and crash these cases earlier rather than permit them to perhaps silently cause failures later.

In my experiments at least, setting the env var *does* prevent the crash, but it doesn’t avoid the undefined semantics (i.e. what happens when the ObjC runtime is called at that point?) and I fully expect that Apple will remove that bandaid at some point.

The other key thing is that I don’t believe you can set the env var *in process* and have it take effect after the fork.  It must be set before the parent process starts.  So that probably makes it less useful for the multiprocessing tests by itself.
History
Date User Action Args
2018-12-13 02:24:44barrysetmessages: + msg331735
2018-12-13 01:59:53ned.deilysetmessages: + msg331733
2018-12-11 11:12:05vstinnersetnosy: + vstinner
messages: + msg331610
2018-12-10 01:42:49barrysetmessages: + msg331459
2018-12-09 15:45:38davinsetmessages: + msg331438
2018-12-09 15:13:53davinsetmessages: + msg331435
2018-12-09 07:33:31ned.deilysetmessages: + msg331411
2018-12-09 07:11:33miss-islingtonsetmessages: + msg331409
2018-12-09 07:06:56miss-islingtonsetnosy: + miss-islington
messages: + msg331407
2018-12-09 06:50:36miss-islingtonsetpull_requests: + pull_request10281
2018-12-09 06:50:28miss-islingtonsetpull_requests: + pull_request10280
2018-12-09 06:50:19ned.deilysetmessages: + msg331406
2018-12-09 06:30:19ned.deilysetkeywords: + patch
stage: patch review
pull_requests: + pull_request10279
2018-12-05 11:52:25ronaldoussorensetmessages: + msg331101
2018-11-15 03:22:35barrysetmessages: + msg329941
2018-11-14 20:43:14barrysetmessages: + msg329933
2018-11-14 19:42:27davinsetmessages: + msg329927
2018-11-14 18:33:00ned.deilysetmessages: + msg329926
2018-11-14 18:16:05pitrousetmessages: + msg329923
2018-11-14 18:11:26davinsetmessages: + msg329922
2018-11-14 17:52:37barrysetmessages: + msg329919
2018-11-14 01:36:15barrysetmessages: + msg329885
2018-11-14 01:14:28barrysetmessages: + msg329880
2018-11-14 01:07:44barrysettitle: Pytho crashes on macOS after fork with no exec -> Python crashes on macOS after fork with no exec
2018-11-13 21:51:14barrysettitle: macOS crashes after fork with no exec -> Pytho crashes on macOS after fork with no exec
2018-11-13 21:51:01barrysettitle: High Sierra hang when using multi-processing -> macOS crashes after fork with no exec
2018-11-13 21:50:33barrysetversions: + Python 3.6, Python 3.7, Python 3.8
2018-11-13 21:50:10barrysetmessages: + msg329871
2018-11-13 21:49:01barrylinkissue35219 superseder
2018-11-12 21:48:05barrysetnosy: + barry
2018-06-04 21:39:10ned.deilysetmessages: + msg318708
2018-06-03 08:51:31ronaldoussorensetmessages: + msg318529
2018-06-03 08:47:57ronaldoussorensetmessages: + msg318528
2018-06-01 11:10:22pitrousetmessages: + msg318397
2018-06-01 10:57:29ned.deilysetnosy: + pitrou, davin
messages: + msg318396
2018-06-01 05:51:16ronaldoussorensetmessages: + msg318361
2018-06-01 00:53:06kapiltcreate