Message 342791 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	shprotx
Recipients	Johan Dahlin, db3l, emilyemorehouse, eric.snow, koobs, nascheme, ncoghlan, pmpp, serhiy.storchaka, shprotx, steve.dower, vstinner, yselivanov
Date	2019-05-18.10:19:49
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1558174789.82.0.425192454364.issue33608@roundup.psfhosted.org>
In-reply-to

Content
I was able to reproduce the error with version f13c5c8b9401a9dc19e95d8b420ee100ac022208 on FreeBSD 12.0 VM. The error seems to be caused not by those changes, but by lack of synchronization in the multiprocessing.managers.Server. The failure happens when running the "test_shared_memory_SharedMemoryManager_basics" with high CPU load and frequent interrupts e.g. moving some window during test. Mostly I used the "python -m test --fail-env-changed test_multiprocessing_spawn -m 'WithProcessesTestS[hu]' -F" command to reproduce the crash. By analyzing core dumps I deduced that the crash happens during this call from the parent test process: class BaseManager(object): def _finalize_manager(process, address, authkey, state, _Client): ... try: conn = _Client(address, authkey=authkey) try: dispatch(conn, None, 'shutdown') finally: conn.close() except Exception: pass Main thread in the multiprocessing child: class Server(object): def serve_forever(self): ... try: accepter = threading.Thread(target=self.accepter) accepter.daemon = True accepter.start() try: while not self.stop_event.is_set(): self.stop_event.wait(1) except (KeyboardInterrupt, SystemExit): pass finally: ... sys.exit(0) << main thread have finished and destroyed the interpreter Worker thread in the multiprocessing child. Locals: File "/usr/home/user/cpython/Lib/multiprocessing/managers.py", line 214, in handle_request c.send(msg) self = <SharedMemoryServer(....)> funcname = 'shutdown' result = None request = (None, 'shutdown', (), {}) ignore = None args = () kwds = {} msg = ('#RETURN', None) Listing: class Server(object): def handle_request(self, c): ... try: result = func(c, args, **kwds) << calls Server.shutdown method except Exception: msg = ('#TRACEBACK', format_exc()) else: msg = ('#RETURN', result) try: c.send(msg) << crashes with SIGBUS in _send_bytes -> write -> take_gil -> SET_GIL_DROP_REQUEST(tstate->interp) except Exception as e: try: c.send(('#TRACEBACK', format_exc())) except Exception: pass ... def shutdown(self, c): ... try: util.debug('manager received shutdown message') c.send(('#RETURN', None)) except: import traceback traceback.print_exc() finally: self.stop_event.set() Worker thread is daemonic and is not terminated during the interpreter finalization, thus it might still be running and is terminated silently when the process exits. The connection (c) has different implementations on several platforms, so we cannot be sure whether the connection is closed during shutdown or not, whether the last "c.send(msg)" blocks until the end of the process, returns instantly, or fails inconsistently. The error was there for a long time, but for two reasons it didn't cause much trouble: - the race condition is hard to trigger; - SET_GIL_DROP_REQUEST used to ignore the errorneous state of interpreter, but introduction of tstate->interp argument by Eric manifested SIGBUS on FreeBSD. I haven't managed to find a nice clean test to reproduce the bug automatically. I suggest the changes for the multiprocessing/managers.py in the attachment.

I was able to reproduce the error with version f13c5c8b9401a9dc19e95d8b420ee100ac022208 on FreeBSD 12.0 VM. The error seems to be caused not by those changes, but by lack of synchronization in the multiprocessing.managers.Server.
The failure happens when running the "test_shared_memory_SharedMemoryManager_basics" with high CPU load and frequent interrupts e.g. moving some window during test. Mostly I used the "python -m test --fail-env-changed test_multiprocessing_spawn -m 'WithProcessesTestS[hu]*' -F" command to reproduce the crash.
By analyzing core dumps I deduced that the crash happens during this call from the parent test process:

class BaseManager(object):
    def _finalize_manager(process, address, authkey, state, _Client):
        ...
            try:
                conn = _Client(address, authkey=authkey)
                try:
                    dispatch(conn, None, 'shutdown')
                finally:
                    conn.close()
            except Exception:
                pass

Main thread in the multiprocessing child:

class Server(object):
    def serve_forever(self):
        ...
        try:
            accepter = threading.Thread(target=self.accepter)
            accepter.daemon = True
            accepter.start()
            try:
                while not self.stop_event.is_set():
                    self.stop_event.wait(1)
            except (KeyboardInterrupt, SystemExit):
                pass
        finally:
            ...
            sys.exit(0)  << main thread have finished and destroyed the interpreter

Worker thread in the multiprocessing child.
Locals:
File "/usr/home/user/cpython/Lib/multiprocessing/managers.py", line 214, in handle_request
    c.send(msg)
        self = <SharedMemoryServer(....)>
        funcname = 'shutdown'
        result = None
        request = (None, 'shutdown', (), {})
        ignore = None
        args = ()
        kwds = {}
        msg = ('#RETURN', None)

Listing:
class Server(object):
    def handle_request(self, c):
        ...
            try:
                result = func(c, *args, **kwds)  << calls Server.shutdown method
            except Exception:
                msg = ('#TRACEBACK', format_exc())
            else:
                msg = ('#RETURN', result)
        try:
            c.send(msg)  << crashes with SIGBUS in _send_bytes -> write -> take_gil -> SET_GIL_DROP_REQUEST(tstate->interp)
        except Exception as e:
            try:
                c.send(('#TRACEBACK', format_exc()))
            except Exception:
                pass
    ...
    def shutdown(self, c):
        ...
        try:
            util.debug('manager received shutdown message')
            c.send(('#RETURN', None))
        except:
            import traceback
            traceback.print_exc()
        finally:
            self.stop_event.set()

Worker thread is daemonic and is not terminated during the interpreter finalization, thus it might still be running and is terminated silently when the process exits. The connection (c) has different implementations on several platforms, so we cannot be sure whether the connection is closed during shutdown or not, whether the last "c.send(msg)" blocks until the end of the process, returns instantly, or fails inconsistently.
The error was there for a long time, but for two reasons it didn't cause much trouble:
- the race condition is hard to trigger;
- SET_GIL_DROP_REQUEST used to ignore the errorneous state of interpreter, but introduction of tstate->interp argument by Eric manifested SIGBUS on FreeBSD.

I haven't managed to find a nice clean test to reproduce the bug automatically. I suggest the changes for the multiprocessing/managers.py in the attachment.

History
Date	User	Action	Args
2019-05-18 10:19:49	shprotx	set	recipients: + shprotx, nascheme, db3l, ncoghlan, vstinner, pmpp, eric.snow, serhiy.storchaka, yselivanov, koobs, steve.dower, emilyemorehouse, Johan Dahlin
2019-05-18 10:19:49	shprotx	set	messageid: <1558174789.82.0.425192454364.issue33608@roundup.psfhosted.org>
2019-05-18 10:19:49	shprotx	link	issue33608 messages
2019-05-18 10:19:49	shprotx	create