msg245597 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-21 15:08 |
I'm writing a library on top of multiprocessing. As part of the test suite, I create and destroy dozens of processes repeatedly. About once in 50 runs, the tests complete successfully but the program crashes with:
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 286, in _exit_function
_run_finalizers(0)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 246, in _run_finalizers
items = [x for x in list(_finalizer_registry.items()) if f(x)]
RuntimeError: dictionary changed size during iteration
(this is on OS X 10.10.3, Python 3.4.3; I've been unable to replicate the behavior on Python 2.7.9, 3.5.0b2, or pypy despite hundreds of runs)
The problematic code is available at https://github.com/jherskovic/mpetl and the tests should be run with nosetests
|
msg245629 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-22 13:47 |
This happens reliably between 1-5% of the time on my home Mac (a 4.0 GHz i7). My work Mac Pro, a lot slower, doesn't exhibit this behavior on the same 3.4.3 interpreter.
Could it be related to a concurrency issue in the interpreter?
We're attempting to replicate on other systems.
|
msg245687 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-23 13:01 |
More data:
I've been unable to replicate this behavior on Ubuntu 15.04 on Python 3.4.3 over ~20 thousand test runs. (Same machine, running in a VM)
An overnight repeated run on the original machine on OS X, 3.4.3, official distribution gave an actual frequency of 0.1% (7 crashes in 6980 runs).
Approximately 3,300 runs on Python 2.7.9 on the same machine have not resulted in any crashes.
I now built 3.4.3 from source using the latest clang and am trying it again, on the same machine, instead of using the official distribution. So far, 0 crashes in approximately 3,300 runs so far.
At the moment, then, the problem seems localized to the official 3.4.3 binaries for OS X.
|
msg245704 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2015-06-24 02:14 |
Can you show the version info from the Python 3.4 where you see the failures?
/usr/local/bin/python3.4 -c 'import sys;print(sys.version)'
|
msg245705 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2015-06-24 02:54 |
Exactly how are you building the Python 3.4 that does not fail for you? FWIW, I am able to reproduce the failure easily on a 10.10.3 system using python.org 64-bit/32-bit 3.4.3, using a current MacPorts source-built 3.4.3 (which is built with clang), and repeatedly using the current head of the 3.4 branch (post-3.4.3) built with clang, both with and without pydebug enabled.
|
msg245707 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-24 03:27 |
The failing Python:
3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
The non-failing Python:
3.4.3 (default, Jun 23 2015, 06:33:02)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)]
The non-failing python, I just downloaded the tarball and ran ./configure --prefix=/opt/python343, make, sudo make install, and then created a virtualenv using that binary.
Given what you're telling me, I'll run some more loops with the "non-failing" python. I'm grateful you've been able to reproduce the problem, FWIW.
|
msg245713 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2015-06-24 03:52 |
FWIW, building 3.4.3 from source as you described results in failures about 30% of the time. With some of the earlier tests, it failed even more frequently. This is running on a 2.2GHz I7.
|
msg245731 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-24 11:45 |
20,000+ iterations of the test suite with my homebuilt 3.4.3 throw no exceptions.
Weirder and weirder.
Any suggestions? Without any actual knowledge ("unencumbered by the thought process"), a concurrency issue in the interpreter itself is my guess. Especially since I can't reproduce it at all on other versions.
I'll see if I can come up with a shorter, self-contained test that can illustrate the issue too.
|
msg245738 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-24 13:17 |
Finally able to repro on my old Mac Pro at work (Universal 64/32bit 3.4.3 from python.org, OS X 10.10.3). For some reason I can't quite fathom, there were two exceptions this time.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 286, in _exit_function
_run_finalizers(0)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 246, in _run_finalizers
items = [x for x in list(_finalizer_registry.items()) if f(x)]
RuntimeError: dictionary changed size during iteration
Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/util.py", line 185, in __call__
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/queues.py", line 207, in _finalize_close
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 342, in notify
TypeError: 'NoneType' object is not callable
|
msg245739 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-24 13:28 |
Ok, sorry, one more important thing I omitted in the previous comment.
^^^ The second exception above seems to be related to a new test I added, one that includes a daemonic process which itself relies on daemonic threads.
The cleanup of that seems to be FUBARed, and it makes the original exception much more likely to pop up as well.
|
msg245810 - (view) |
Author: Jorge Herskovic (Jorge Herskovic) |
Date: 2015-06-25 15:06 |
Ok, I *think* I tracked it down to Connections getting cleaned up in a non-orderly fashion, sometimes while _run_finalizers is executing. The following patch to lib/python3.4/multiprocessing/connection.py cures the problem on my machine.
@@ -124,10 +124,10 @@
self._handle = handle
self._readable = readable
self._writable = writable
-
+ self._shutdown = util.Finalize(self, self._finalizer)
# XXX should we use util.Finalize instead of a __del__?
-
- def __del__(self):
+
+ def _finalizer(self):
if self._handle is not None:
self._close()
|
msg250137 - (view) |
Author: Davin Potts (davin) *  |
Date: 2015-09-07 23:09 |
The 'crash' type is reserved for core dump type situations; 'behavior' is more appropriate for the misbehavior your describe.
|
msg294546 - (view) |
Author: Marcin Słowik (Marandil) |
Date: 2017-05-26 11:09 |
I can confirm, I have managed to accidentally reproduce this issue on a custom build of 3.6.0 on RHEL.
sys.version:
'3.6.0 (default, Jan 18 2017, 11:23:11)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)]'
The issue popped out of nowhere, on a code that was working w/out any hiccups on multiple systems; after an insignificant change was made to an unrelated part of code.
I managed to "patch it up" by forcingly closing and joining all Queues before letting the main process to stop.
Unfortunately, I cannot share the repro code (it's a) too large, b) closed), but I can confirm the issue was still open in 3.6.0.
|
msg295876 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 09:25 |
The simple answer here is _run_finalizers() is not thread-safe, but finalizers can be registered or cancelled from any thread.
I could not write a simple synthetic script to reproduce the issue, but it should nevertheless be fixable.
|
msg295880 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 09:32 |
Marcin, Jorge, Ned, could one of you apply the following simple patch and see if that seems to solve the issue?
|
msg295886 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 10:03 |
Ok, I got to write a reproducer, which fails reliably here.
|
msg295927 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:11 |
New changeset 1eb6c0074d17f4fd425cacfdda893d65f5f77f0a by Antoine Pitrou in branch 'master':
bpo-24484: Avoid race condition in multiprocessing cleanup (#2159)
https://github.com/python/cpython/commit/1eb6c0074d17f4fd425cacfdda893d65f5f77f0a
|
msg295929 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:32 |
Python 2.7 doesn't suffer from this bug, so no actual fix is needed. Still, I'll backport the additional test.
|
msg295931 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:51 |
New changeset 6fd03459957ee53941183212457bba19f977679f by Antoine Pitrou in branch '3.6':
[3.6] bpo-24484: Avoid race condition in multiprocessing cleanup (GH-2159) (#2166)
https://github.com/python/cpython/commit/6fd03459957ee53941183212457bba19f977679f
|
msg295933 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:52 |
New changeset d09f1674d70dd84c5be37b5fe11bca4913146c6b by Antoine Pitrou in branch '2.7':
[2.7] bpo-24484: Avoid race condition in multiprocessing cleanup (GH-2159) (#2168)
https://github.com/python/cpython/commit/d09f1674d70dd84c5be37b5fe11bca4913146c6b
|
msg295934 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:54 |
New changeset a0ecaab006849ade08fbcf8413fe1e92d9c8d71c by Antoine Pitrou in branch '3.5':
[3.5] bpo-24484: Avoid race condition in multiprocessing cleanup (GH-2159) (#2167)
https://github.com/python/cpython/commit/a0ecaab006849ade08fbcf8413fe1e92d9c8d71c
|
msg295935 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-06-13 15:54 |
This should be fixed now. Please reopen if not.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:18 | admin | set | github: 68672 |
2017-06-13 15:54:48 | pitrou | set | status: open -> closed resolution: fixed messages:
+ msg295935
stage: commit review -> resolved |
2017-06-13 15:54:12 | pitrou | set | messages:
+ msg295934 |
2017-06-13 15:52:31 | pitrou | set | messages:
+ msg295933 |
2017-06-13 15:51:30 | pitrou | set | messages:
+ msg295931 |
2017-06-13 15:32:48 | pitrou | set | stage: needs patch -> commit review messages:
+ msg295929 versions:
- Python 2.7 |
2017-06-13 15:32:20 | pitrou | set | pull_requests:
+ pull_request2219 |
2017-06-13 15:23:00 | pitrou | set | pull_requests:
+ pull_request2218 |
2017-06-13 15:15:15 | pitrou | set | pull_requests:
+ pull_request2217 |
2017-06-13 15:11:25 | pitrou | set | messages:
+ msg295927 |
2017-06-13 10:31:20 | pitrou | set | pull_requests:
+ pull_request2210 |
2017-06-13 10:03:26 | pitrou | set | files:
+ finalizestress.py
messages:
+ msg295886 |
2017-06-13 09:32:05 | pitrou | set | files:
+ run_finalizers.patch keywords:
+ patch messages:
+ msg295880
|
2017-06-13 09:25:11 | pitrou | set | versions:
+ Python 2.7, Python 3.5, Python 3.6, Python 3.7, - Python 3.4 nosy:
+ pitrou
messages:
+ msg295876
stage: needs patch |
2017-05-26 11:09:31 | Marandil | set | nosy:
+ Marandil messages:
+ msg294546
|
2015-09-07 23:09:25 | davin | set | type: crash -> behavior messages:
+ msg250137 |
2015-06-25 15:06:21 | Jorge Herskovic | set | messages:
+ msg245810 |
2015-06-24 13:28:58 | Jorge Herskovic | set | messages:
+ msg245739 |
2015-06-24 13:17:03 | Jorge Herskovic | set | messages:
+ msg245738 |
2015-06-24 11:45:46 | Jorge Herskovic | set | messages:
+ msg245731 |
2015-06-24 03:52:42 | ned.deily | set | messages:
+ msg245713 |
2015-06-24 03:27:28 | Jorge Herskovic | set | messages:
+ msg245707 |
2015-06-24 02:54:33 | ned.deily | set | messages:
+ msg245705 |
2015-06-24 02:14:05 | ned.deily | set | nosy:
+ ned.deily messages:
+ msg245704
|
2015-06-23 13:01:52 | Jorge Herskovic | set | messages:
+ msg245687 |
2015-06-22 13:47:57 | Jorge Herskovic | set | messages:
+ msg245629 |
2015-06-21 22:08:37 | ned.deily | set | nosy:
+ sbt, davin
|
2015-06-21 15:08:26 | Jorge Herskovic | create | |