This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Seg fault on macOS using multiprocessing.JoinableQueue
Type: crash Stage: resolved
Components: macOS Versions: Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: davin, jacobtylerwalls, ned.deily, pitrou, ronaldoussoren
Priority: normal Keywords:

Created on 2021-04-10 16:17 by jacobtylerwalls, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (4)
msg390721 - (view) Author: Jacob Walls (jacobtylerwalls) * Date: 2021-04-10 16:17
macOS 10.13.6
Python 3.9.2

I can consistently reproduce a seg fault while using multiprocessing.JoinableQueue in Python 3.9.2.

My use case is the sheet music processing library music21. My fork includes a folder of 209 files I use to reproduce, running 3 cores, shown in the script below. (This is a subset of the over 1,000 files found here: https://github.com/MarkGotham/When-in-Rome/tree/master/Corpus/OpenScore-LiederCorpus
Using this set of 1,000 files reproduces nearly every time; using the 209 files I committed to my fork was enough to reproduce about 75% of the time.)

I'm a contributor to music21, so if this is an overwhelming amount of information to debug, I will gladly pare this down as much as I can or create some methods to access the multiprocessing functionality more directly. Many thanks for any assistance.


pip3 install git+https://github.com/jacobtylerwalls/music21.git@bpo-investigation

from music21 import corpus
# suggest using a unique name each attempt
lc = corpus.corpora.LocalCorpus(name='bpo-investigation')
# point to the directory of files I committed to my fork for this investigation
lc.addPath('/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/music21/bpo-files')
# parse the files using multiprocessing
# calls music21.metadata.bundles.MetadataBundle.addFromPaths()
# which calls music21.metadata.caching.process_parallel()
lc.save()

# CTRL-C to recover from seg fault
# then, wipe out the entries in .music21rc so that you can cleanly reproduce again
from music21 import environment
us = environment.UserSettings()
us['localCorporaSettings'] = {}
quit()


Process:               Python [31677]
Path:                  /Library/Frameworks/Python.framework/Versions/3.9/Resources/Python.app/Contents/MacOS/Python
Identifier:            Python
Version:               3.9.2 (3.9.2)
Code Type:             X86-64 (Native)
Parent Process:        Python [31674]
Responsible:           Python [31677]
User ID:               501

Date/Time:             2021-04-10 11:21:19.294 -0400
OS Version:            Mac OS X 10.13.6 (17G14042)
Report Version:        12
Anonymous UUID:        E7B0208A-19D6-ABDF-B3EA-3910A56B3E72

Sleep/Wake UUID:       C4B83F57-6AD1-469E-82AE-88214FAA6283

Time Awake Since Boot: 140000 seconds
Time Since Wake:       5900 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000100b3acd8
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [0]

VM Regions Near 0x100b3acd8:
--> 
    __TEXT                 00000001068bb000-00000001068bc000 [    4K] r-x/rwx SM=COW   [/Library/Frameworks/Python.framework/Versions/3.9/Resources/Python.app/Contents/MacOS/Python]

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   org.python.python             	0x0000000106944072 PyObject_RichCompare + 258
1   org.python.python             	0x0000000106943e9b PyObject_RichCompareBool + 43
2   org.python.python             	0x00000001069ce3c0 min_max + 624
3   org.python.python             	0x0000000106940bab cfunction_call + 59
4   org.python.python             	0x0000000106901cad _PyObject_MakeTpCall + 365
5   org.python.python             	0x00000001069d865c call_function + 876
6   org.python.python             	0x00000001069d5b8b _PyEval_EvalFrameDefault + 25371
7   org.python.python             	0x0000000106902478 function_code_fastcall + 104
8   org.python.python             	0x00000001069d85cc call_function + 732
9   org.python.python             	0x00000001069d5ad2 _PyEval_EvalFrameDefault + 25186
10  org.python.python             	0x00000001069d92c3 _PyEval_EvalCode + 2611
11  org.python.python             	0x0000000106902401 _PyFunction_Vectorcall + 289
12  org.python.python             	0x00000001069d85cc call_function + 732
13  org.python.python             	0x00000001069d5ad2 _PyEval_EvalFrameDefault + 25186
14  org.python.python             	0x00000001069d92c3 _PyEval_EvalCode + 2611
15  org.python.python             	0x0000000106902401 _PyFunction_Vectorcall + 289
16  org.python.python             	0x0000000106901b05 _PyObject_FastCallDictTstate + 293
17  org.python.python             	0x00000001069026e8 _PyObject_Call_Prepend + 152
18  org.python.python             	0x000000010695be85 slot_tp_init + 165
19  org.python.python             	0x00000001069573d9 type_call + 345
20  org.python.python             	0x0000000106901cad _PyObject_MakeTpCall + 365
21  org.python.python             	0x00000001069d865c call_function + 876
22  org.python.python             	0x00000001069d5af3 _PyEval_EvalFrameDefault + 25219
23  org.python.python             	0x0000000106902478 function_code_fastcall + 104
24  org.python.python             	0x00000001069044ba method_vectorcall + 202
25  org.python.python             	0x00000001069d85cc call_function + 732
26  org.python.python             	0x00000001069d5af3 _PyEval_EvalFrameDefault + 25219
27  org.python.python             	0x0000000106902478 function_code_fastcall + 104
28  org.python.python             	0x00000001069d85cc call_function + 732
29  org.python.python             	0x00000001069d5ad2 _PyEval_EvalFrameDefault + 25186
30  org.python.python             	0x0000000106902478 function_code_fastcall + 104
31  org.python.python             	0x00000001069d85cc call_function + 732
32  org.python.python             	0x00000001069d5ad2 _PyEval_EvalFrameDefault + 25186
33  org.python.python             	0x0000000106902478 function_code_fastcall + 104
34  org.python.python             	0x00000001069d85cc call_function + 732
35  org.python.python             	0x00000001069d5ad2 _PyEval_EvalFrameDefault + 25186
36  org.python.python             	0x00000001069d92c3 _PyEval_EvalCode + 2611
37  org.python.python             	0x0000000106902401 _PyFunction_Vectorcall + 289
38  org.python.python             	0x00000001069d85cc call_function + 732
39  org.python.python             	0x00000001069d5ad2 _PyEval_EvalFrameDefault + 25186
40  org.python.python             	0x0000000106902478 function_code_fastcall + 104
41  org.python.python             	0x00000001069d85cc call_function + 732
42  org.python.python             	0x00000001069d5b8b _PyEval_EvalFrameDefault + 25371
43  org.python.python             	0x00000001069d92c3 _PyEval_EvalCode + 2611
44  org.python.python             	0x0000000106902401 _PyFunction_Vectorcall + 289
45  org.python.python             	0x00000001069d85cc call_function + 732
46  org.python.python             	0x00000001069d5c21 _PyEval_EvalFrameDefault + 25521
47  org.python.python             	0x00000001069d92c3 _PyEval_EvalCode + 2611
48  org.python.python             	0x00000001069cf74b PyEval_EvalCode + 139
49  org.python.python             	0x0000000106a21fc4 PyRun_StringFlags + 356
50  org.python.python             	0x0000000106a21e15 PyRun_SimpleStringFlags + 69
51  org.python.python             	0x0000000106a3e367 Py_RunMain + 1047
52  org.python.python             	0x0000000106a3eaef pymain_main + 223
53  org.python.python             	0x0000000106a3eceb Py_BytesMain + 43
54  libdyld.dylib                 	0x00007fff5a148015 start + 1
msg390753 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2021-04-10 22:45
Thanks for providing a detailed and relatively simple-to-run test case for such a complicated failure. Not totally surprising for what appears to likely be a race condition, I have been unable to reproduce it under several macOS environments including in a 10.13.6 VM with multiple cores and under 11.2.3. I'm not sure if this would be expected to affect the results but I did receive multiple "WARNING: Could not import wedge: Error in getting DynamicWedges" messages when running the test case.

Doing a quick exam of the installed set up, it appears that there is no attempt to use multiprocessing's "fork" method which is known to be problematic on macOS so that's a plus. And there don't appear to be any extension modules so the test case is pure Python, eliminating other likely suspects.

One question that does come to mind is exactly which version of Python 3.9.2 you are testing with; can you provide the results of: /path/to/python3.9 -c 'import sys;print(sys.version)' ?

Searching bugs.python.org, I see a few open issues with segfaults in PyObject_RichCompare but nothing that leaps out as being obviously similar. If it were possible to reproduce the segfault in other environments, like with 3.9.4 or on newer versions of macOS or on a current Linux platform, that would help to confirm the issue. Even better would be to be able to reproduce the issue while running a current Python 3.9 built with --with-pydebug on; unfortunately, we don't normally provide pre-built debug binaries on python.org. And, of course, running in debug mode could affect the rece condition, if that is indeed the issue.
msg390765 - (view) Author: Jacob Walls (jacobtylerwalls) * Date: 2021-04-11 03:38
Thanks for this detailed reply. I reproduced on Python 3.9.4 on the same iMac from my original report running macOS 10.13.6, but with much lesser frequency (I wouldn't use the word "consistently" anymore).

I tried on a MacBook Pro with worn-out hardware running a newer OS (10.15.4) and could not reproduce the issue there. I also built cPython (Python 3.10.0a7+ (heads/master:ac05f82ad4, Apr 10 2021, 20:16:36) [Clang 10.0.0 (clang-1000.10.44.4)] on darwin) using --with-pydebug and ran the test case a few times on the good-hardware iMac, and observed the file parsing (predictably) slow to a crawl, but no reproduction of the segfault. This leads me to believe that, yes, this is a race condition I'm encountering on fast hardware.

Possibly related to issue-25769, since music21 makes heavy use of weakrefs and since music21.metadata.caching.MetadataCachingJob.run() calls gc.collect(). Perhaps I can look into engineering a minimal test case based on that discussion, involving a deliberately expensive __eq__() call.

To answer your original question: my first report on 3.9.2 was on this specific version:
3.9.2 (v3.9.2:1a79785e3e, Feb 19 2021, 09:06:10) \n[Clang 6.0 (clang-600.0.57)]
msg391345 - (view) Author: Jacob Walls (jacobtylerwalls) * Date: 2021-04-18 20:45
Unfortunately, at the outset I should have tested this without multiprocessing. I can reproduce without multiprocessing[1], which meant I could more easily pinpoint the failure. There is an expensive O(nm) algorithm[2] in the music21 library that is overflowing.

I appreciate your time looking into this. Closing.

Regards, Jacob


[1] in the provided script, after one call to lc.save() call lc.rebuildMetadataCache(useMultiprocessing=False)

[2] music21.analysis.discrete.Ambitus.getPitchRanges(), and I plan to do something about it.
History
Date User Action Args
2022-04-11 14:59:44adminsetgithub: 87968
2021-04-18 20:45:25jacobtylerwallssetstatus: open -> closed
resolution: not a bug
messages: + msg391345

stage: resolved
2021-04-11 03:38:25jacobtylerwallssetmessages: + msg390765
2021-04-10 22:45:59ned.deilysetnosy: + pitrou, davin
messages: + msg390753
2021-04-10 16:17:02jacobtylerwallscreate