classification
Title: Apple-supplied libsqlite3 on OS X is not fork safe; can cause crashes
Type: crash Stage:
Components: macOS Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: chris.jerdonek, eitan.adler, evan.jones@bluecore.com, ned.deily, ronaldoussoren
Priority: normal Keywords:

Created on 2016-05-25 23:05 by evan.jones@bluecore.com, last changed 2017-12-12 23:15 by eitan.adler.

Files
File name Uploaded Description Edit
osx_python_crash.py evan.jones@bluecore.com, 2016-05-25 23:05
osx_python3_crash.py evan.jones@bluecore.com, 2016-05-25 23:05
Messages (7)
msg266396 - (view) Author: Evan Jones (evan.jones@bluecore.com) Date: 2016-05-25 23:05
The system version of libsqlite3 that is included in Mac OS X is not fork safe. This means that if a process forks, and the child calls into it, it will crash with the stack trace below.

I've reproduced this with both Python 2.7.10 and Python 3.5.1 on Mac OS X 10.11.5. There are a number of reports about this issue on the Internet. The only way I can think to solve this problem is to bundle SQLite with the Python source code, and build an included version. This will avoid the problem, since only Apple's fork of SQLite uses the problematic libdispatch library.


Details:

* Apple ships a version of sqlite3 that uses their "Grand Central Dispatch" libdispatch library.

* Grand Central Dispatch is explicitly *not* thread safe. From their docs:

https://developer.apple.com/library/mac/documentation/Performance/Reference/GCD_libdispatch_Ref/index.html

"Be careful when mixing GCD with the fork system call. If a process makes GCD calls prior to calling fork, it is not safe to make additional GCD calls in the resulting child process until after a successful call to exec or related functions."

* Some System APIs also seem to call into this library. In my case: urllib/urllib2 access the system's proxy settings.


Related bugs:
* I believe this is the root cause of https://bugs.python.org/issue20353
* Celery also has a detailed bug report: https://github.com/celery/celery/issues/869
* Pure native code can cause this crash: http://ludovicrousseau.blogspot.com/2015/01/os-x-yosemite-bug-pcsc-functions-crash.html



Crash details from Mac OS X's system Python:

Application Specific Information:
crashed on child side of fork pre-exec

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libdispatch.dylib               0x00007fff92e8b162 _dispatch_barrier_async_f_slow + 356
1   libsqlite3.dylib                0x00007fff8a607656 sqlite3_initialize + 2950
2   libsqlite3.dylib                0x00007fff8a605b21 openDatabase + 65
3   _sqlite3.so                     0x0000000105bb9a55 pysqlite_connection_init + 509
4   org.python.python               0x000000010567bc24 0x10562f000 + 314404
5   org.python.python               0x0000000105639202 PyObject_Call + 99
6   _sqlite3.so                     0x0000000105bbdbf0 0x105bb8000 + 23536
7   org.python.python               0x00000001056b5a0b PyEval_EvalFrameEx + 13400
8   org.python.python               0x00000001056b8541 0x10562f000 + 562497
9   org.python.python               0x00000001056b530c PyEval_EvalFrameEx + 11609
10  org.python.python               0x00000001056b23c1 PyEval_EvalCodeEx + 1583
11  org.python.python               0x00000001056b1d8c PyEval_EvalCode + 54
12  org.python.python               0x00000001056d1a42 0x10562f000 + 666178
13  org.python.python               0x00000001056d1ae5 PyRun_FileExFlags + 133
14  org.python.python               0x00000001056d1634 PyRun_SimpleFileExFlags + 698
15  org.python.python               0x00000001056e3011 Py_Main + 3137
16  libdyld.dylib                   0x00007fff8eb225ad start + 1
msg266405 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2016-05-26 03:41
Thanks for your analysis!  I'm sure that you are correct about Issue20353.  It could also be the root cause of other crashes reported when internet proxies are used with urllib* invoking the _scproxy helper extension module, as reported in Issue13829.

The question is what can be done.  There seem to be two separate issues here.  The first is a crash when using Python's sqlite3 module, the problem your test case files demonstrate.  Note this *isn't* a problem when using the sqlite3 module from any of the current python.org OS X binary installer Pythons (2.7 or 3.x) because those Pythons do not use the Apple-supplied system libsqlite3; they already link with their own newer private copy.  Your py2 test, of course, does fail when using the Apple-supplied system Pythons.  We can't solve that problem; only Apple can; you could open a bug report with them.  If other distributors of Python on OS X rely on the system libsqlite3, they could avoid such crashes by also supplying their own copy.  For example, MacPorts already does, so your test cases don't fail with their Pythons.  I don't know what other distributors, like Homebrew or Anaconda, do.  We could add a note to Mac/README.

Second, the stickier (and totally separate) problem is what to do about _scproxy.  If, under the covers, the calls that _scproxy make to the Apple-supplied System Configuration framework use the un-forksafe Apple libsqlite3, there is nothing we can do about that; supplying a private copy of libsqlite3 isn't going to change what the framework uses and, in any case, it would be a really bad idea to even try to hack that.  So, if we don't change _scproxy or urllib*'s use of it, only Apple can fix the problem.  Since we can't expect that that is going to happen, the question becomes what alternatives are there.  One would be to find a way to eliminate _scproxy or its use of the unsafe SC framework calls.  Another approach would be to simply document ths restriction that urllib calls invoking ProxyHandler must be made in a main process (or whatever the precise restriction is) and leave it at that (https://docs.python.org/dev/library/urllib.request.html#urllib.request.ProxyHandler).

Ronald, what do you think?
msg266439 - (view) Author: Evan Jones (evan.jones@bluecore.com) Date: 2016-05-26 14:27
To be clear: My reproduction scripts crash both Python 2.7.10 and Python 3.5.1 when you:

1. Download the source bundle from python.org.
2. Run ./configure; make
3. Use the built binary (because ./configure picks up the system version of libsqlite.dylib)

I did some more digging: The underlying root cause is Mac OS X's libdispatch.dylib. A ton of system APIs (like this proxy one, or GUI libraries, etc), use it. It seems the proxy settings API use it to manage inter-process communication. libdispatch has code that explicitly "poisons" the process if it forks. I think this is because it internally spawns threads, so the forked child state is unreliable, and they figure it is better for it to crash than to fail randomly. This is the classic "don't mix threads and fork" issue, its just that the threads are hidden inside a bunch of system APIs.

One fix for this particular bug would be for _scproxy to fork and use IPC to read the settings, which I think was mentioned in Issue13829. I think it would not also be crazy to ship the amalgamated sqlite3 with Python, to avoid an accidental dependency on sqlite3.

Finally: it might make sense to have 'forkserver' be the default mode for multiprocessing on Mac OS X, since there are other things that cause this same problem (Tkinter is reported on the internet).
msg266483 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2016-05-27 07:40
There is a clear gap between two use cases for Python on OSX:

1) Integrate nicely with Apple stuff

2) Be useful as a development platform for software that will be deployed on Linux machines.

Both use cases are valid, but have slightly different trade-offs. I have both use-cases, but tend not to do things that cause problems here.  In any case I tend to favour the first use case, it is easy enough to develop locally and test on Linux (VMs, docker, ...) for the second use case.

A particular problem is using os.fork, as Evan notes there are problems with using Apples frameworks in programs that use fork without exec-ing a new program. Libdispatch is one problem, but higher-level frameworks are also problematic (the _scproxy problem is not only caused the use of libdispatch).

Spawning off a small executable for the _scproxy case could be worthwhile, although I haven't fully thought about the implications of that. I wouldn't like removing the use of _scproxy, it uses the documented programmatic way to access system-wide proxy configuration. Using something like scutil(1) might work as well, but is hackish.

Changing the default mode for multiprocessing could also be useful, but could result in calls about code that works on Linux but doesn't on OSX due to differences in fork mode.

BTW. Using fork without exec is unsafe in any program using threads, people tend to run into this more on OSX because Apple's libraries use background threads while other Unix-y platforms don't tend to do that.

As to shipping the sqlite with the CPython source: that's not something we do in general. It would be better to document the issue in the build instructions for CPython, and/or tweak the CPython build process to issue a warning when it detects that is using the system version of sqlite on OSX.

BTW2. Another problematic issue on OSX is accessing the system trust store for SSL certificates.  AFAIK the current binaries still use Apple's build of OpenSSL that uses a private API to access the system trust store, using a standalone OpenSSL would require some way to make OpenSSL use the trust store; either by shipping a script to dump the trust store in a format that OpenSSL can use or by using Apple's APIs to access the trust store. IIRC there is an issue about that, but I cannot find it at the moment. Using the public APIs might result in similar problems as mentioned in this issue.
msg266501 - (view) Author: Evan Jones (evan.jones@bluecore.com) Date: 2016-05-27 15:55
I have a crazy idea, but I'm not 100% sure how to implement it: If Python was able to detect and report this error in a friendly way, it would allow people to easily understand what is happening and to work around it. How can we do it?

First idea: In the implementation of os.fork(), detect if libdispatch has been used. If it has, throw an exception. I think this is probably possible using the libdispatch public APIs, but I'll need to figure out the details. In general, this could apply on Linux as well: throw an exception if the process has more than one thread running?

Second idea: On Mac OS X only, libdispatch is intentionally crashing the process. We could install a signal handler that attempts to detect *this specific crash* in order to throw a friendlier exception, or at worst crash with a useful message.

Third idea: Add documentation to the multiprocessing module and os.fork that they are very unsafe on Mac OS X?

Maybe there is a better way of making this crash "friendlier"?
msg273142 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2016-08-19 16:45
FWIW, I just came across an issue in Django's test suite that I believe is caused by the issue reported here. Some of Django's unit tests were hanging for me when run in "parallel" mode (which uses multiprocessing). Here is the ticket I filed there:
https://code.djangoproject.com/ticket/27086
msg273424 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2016-08-23 08:53
> We can't solve that problem; only Apple can;

> So, if we don't change _scproxy or urllib*'s use of it, only Apple can fix the problem.

In the Django ticket I mentioned in my comment above, one of the commenters said, "Just ran the tests at the mentioned commit on my macOS Sierra public beta with a fresh 3.5.2 python environment. No problems there."

(https://code.djangoproject.com/ticket/27086#comment:4 )

In other words, the issue that affected me on Mac OS X El Capitan (whose root cause is this issue I believe) wasn't present in Sierra.

Do you think this means Apple has addressed the issue in the next version of its OS?
History
Date User Action Args
2017-12-12 23:15:30eitan.adlersetnosy: + eitan.adler
2016-08-23 08:53:16chris.jerdoneksetmessages: + msg273424
2016-08-19 16:45:36chris.jerdoneksetmessages: + msg273142
2016-08-19 16:39:44chris.jerdoneksetnosy: + chris.jerdonek
2016-05-27 15:55:28evan.jones@bluecore.comsetmessages: + msg266501
2016-05-27 07:40:41ronaldoussorensetmessages: + msg266483
2016-05-26 14:27:58evan.jones@bluecore.comsetmessages: + msg266439
2016-05-26 03:41:22ned.deilysettype: crash
title: Mac system sqlite3 not fork safe: Bundle a version? -> Apple-supplied libsqlite3 on OS X is not fork safe; can cause crashes
messages: + msg266405
versions: + Python 3.6
2016-05-25 23:05:18evan.jones@bluecore.comsetfiles: + osx_python3_crash.py
2016-05-25 23:05:05evan.jones@bluecore.comcreate