classification
Title: pthread_exit & PyThread_exit_thread from PyEval_RestoreThread etc. are harmful
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, izbyshev
Priority: normal Keywords: 3.2regression

Created on 2021-01-19 19:35 by gregory.p.smith, last changed 2021-01-21 18:29 by izbyshev.

Messages (2)
msg385288 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-01-19 19:35
## BACKGROUND

`PyThread_exit_thread()` calls `pthread_exit(`) and is in turn called from a variety of APIs as documented in the C-API doc update from https://bugs.python.org/issue36427.

The `pthread_exit()` call was originally introduced as a way "resolve" a crashes from daemon threads during shutdown https://bugs.python.org/issue1856.  It did that.  That fix to that even landed in 2.7.8 but was rolled back before 2.7.9 due to a bug in an existing application it exposed at the time (did we miss the implications of that? maybe).  It remained in 3.x.

## PROBLEM

`pthread_exit()` cannot be used blindly by any application.  All code in the threaded application needs to be on board with it and always prepared for any API call they make to potentially lead to thread termination.  Quoting a colleague: "pthread_exit() does not result in stack unwind or local variable destruction".  This means that any code up the stack from the ultimate `pthread_exit()` call that has process-wide state implications that did not go out of its way to register cleanup with `pthread_cleanup_push()` could lead to deadlocks or lost resources. Something implausible to assume that code does.

We're seeing this happen with C/C++ code.  Our C++ builds with `-fno-exceptions` so uncatchable exception based stack unwinding as some pthread_exit implementations may trigger does not happen (and cannot be guaranteed anyways, see bpo-42888).  Said C/C++ code is calling back into Python from a thread and thus must use `PyEval_RestoreThread()` or similar APIs before performing Python C API calls.  If the interpreter is being finalized from another thread... these enter a codepath that ultimately calls `pthread_exit()` leaving corrupt state in the process.  In this case that unexpected thread simply disappearing can lead to a deadlock in our process.

Fundamentally I do not believe the CPython VM should ever call `pthread_exit()` when non-CPython frames are anywhere in the C stack.  This may mean we should never call `pthread_exit()` at all (unsure; but it'd be ideal).

The documentation suggests that all callers in user code of the four C-APIs with the documented `pthread_exit()` caveats need auditing and pre-call `_Py_IsFinalizing()` API checks.  But... I do not believe that would fix anything even if it were done.  `_Py_IsFinalizing()` called without the GIL held means that it could change by the time the `PyEval_RestoreThreads()` API calls it internally do determine if it should exit the thread.  Thus the race condition window would merely be narrowed, not eliminated.  Not good enough.

## CURRENT WORKAROUND (Big Hammer)

Change CPython to call abort() instead of pthread_exit() as that situation is unresolvable and the process dying is better than hanging, partially alive.  That solution isn't friendly, but is better than being silent and allowing deadlock.  A failing process is always better than a hung process, especially a partially hung process.

## SEMI RELATED WORK

https://bugs.python.org/issue42888 - appears to be avoiding some `PyThread_exit_thread()` calls to stop some crashes due to `libgcc_s` being loaded on demand upon thread exit.
msg385289 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-01-19 19:58
C-APIs such as `PyEval_RestoreThreads()` are insufficient for the task they are asked to do.  They return void, yet have a failure mode.

They call pthread_exit() on failure today.

Instead, they need to return an error to the calling application to indicate that "The Python runtime is no longer available."

Callers need to act on that in whatever way is most appropriate to them.
History
Date User Action Args
2021-01-21 18:29:11izbyshevsetnosy: + izbyshev
2021-01-19 19:58:51gregory.p.smithsettitle: pthread_exit & PyThread_exit_thread are harmful -> pthread_exit & PyThread_exit_thread from PyEval_RestoreThread etc. are harmful
2021-01-19 19:58:22gregory.p.smithsetmessages: + msg385289
2021-01-19 19:35:46gregory.p.smithcreate