This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jbms
Recipients gregory.p.smith, izbyshev, jbms, vstinner
Date 2021-09-16.15:25:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1631805944.33.0.468389984989.issue42969@roundup.psfhosted.org>
In-reply-to
Content
Regarding your suggestion of adding a hook like `Py_SetThreadExitCallback`, it seems like there are 4 plausible behaviors that such a callback may implement:

1. Abort the process immediately with an error.

2. Exit immediately with the original exit code specified by the user.

3. Hang the thread.

4. Attempt to unwind the thread, like `pthread_exit`, calling pthread thread cleanup functions and C++ destructors.

5. Terminate the thread immediately without any cleanup or C++ destructor calls.

The current behavior is (4) on POSIX platforms (`pthread_exit`), and (5) on Windows (`_endthreadex`).

In general, achieving a clean shutdown will require the cooperation of all relevant code in the program, particularly code using the Python C API.  Commonly the Python C API is used more by library code rather than application code, while it would presumably be the application that is responsible for setting this callback.  Writing a library that supports multiple different thread shutdown behaviors would be particularly challenging.

I think the callback is useful, but we would still need to discuss what the default behavior should be (hopefully different from the current behavior), and what guidance would be provided as far as what the callback is allowed to do.

Option (1) is highly likely to result in a user-visible error --- a lot of Python programs that previously exited successfully will now, possibly only some of the time, exit with an error.  The advantage is the user is alerted to the fact that some threads were not cleanly exited, but a lot of previously working code is now broken.  This seems like a reasonable policy for a given application to impose (effectively requiring the use of an atexit handler to terminate all daemon threads), but does not seem like a reasonable default given the existing use of daemon threads.

Option (2) would likely do the right thing in many cases, but main thread cleanup that was previously run would now be silently skipped.  This again seems like a reasonable policy for a given application to impose, but does not seem like a reasonable default.

Option (3) avoids the possibility of crashes and memory corruption.  Since the thread stack remains allocated, any pointers to the thread stack held in global data structures or by other threads remain valid.  There is a risk that the thread may be holding a lock, or otherwise block progress of the main thread, resulting in silent deadlock.  That can be mitigated by registering an atexit handler.

Option (4) in theory would allow cleanup handlers to be registered in order to avoid deadlock due to locks held.  In practice, though, it causes a lot of problems:
 - The CPython codebase itself contains no such cleanup handlers, and I expect the vast majority of existing C extensionss are also not designed to properly handle the stack unwind triggered by `pthread_exit`.  Without proper cleanup handlers, this option reverts to option (5), where there is a risk of memory corruption due to other threads accessing pointers to the freed thread stack.  There is also the same risk of deadlock as in option (3).
 - Stack unwinding interacts particularly badly with common C++ usage because the very first thing most people want to do when using the Python C API from C++ is create a "smart pointer" type for holding a `PyObject` pointer that handles the reference counting automatically (calls `Py_INCREF` when copied, `Py_DECREF` in the destructor).  When the stack unwinds due to `pthread_exit`, the current thread will NOT hold the GIL, and these `Py_DECREF` calls result in a crash / memory corruption.  We would need to either create a new finalizing-safe version of Py_DECREF, that is a noop when called from a non-main thread if `_Py_IsFinalizing()` is true (and then existing C++ libraries like pybind11 would need to be changed to use it), or modify the existing `Py_DECREF` to always have that additional check.  Other calls to Python C APIs in destructors are also common.
 - When writing code that attempts to be safe in the presence of stack unwinding due to `pthread_exit`, it is not merely explicitly GIL-related calls that are a concern.  Virtually any Python C API function can transitively release and acquire the GIL and therefore you must defend against unwind from virtually all Python C API functions.
 - Some C++ functions in the call stack may unintentionally catch the exception thrown by `pthread_exit` and then return normally.  If they return back to a CPython stack frame, memory corruption/crashing is likely.
 - Alternatively, some C++ functions in the call stack may be marked `noexcept`.  If the unwinding reaches such a function, then we end up with option (1).
  - In general this option seems to require auditing and fixing a very large amount of existing code, and introduces a lot of complexity.  For that reasons, I think this option should be avoided.  Even on a per-application basis, this option should not be used because it requires that every C extension specifically support it.

Option (5) has the risk of memory corruption due to other threads accessing pointers to the freed thread stack.  There is also the same risk of deadlock as in option (3).  It avoids the problem of calls to Python C APIs in C++ destructors.  I would consider this options strictly worse than option (3), since there is the same risk of deadlock, but the additional risk of memory corruption.  We free the thread stack slightly sooner, but since the program is exiting soon anyway that is not really advantageous.

The fact that the current behavior differs between POSIX and Windows is particularly unfortunate.

I would strongly urge that the default behavior be changed to (3).  If `Py_SetThreadExitCallback` is added, the documentation could indicate that the callback is allowed to terminate the process or hang, but must not attempt to terminate the thread.
History
Date User Action Args
2021-09-16 15:25:44jbmssetrecipients: + jbms, gregory.p.smith, vstinner, izbyshev
2021-09-16 15:25:44jbmssetmessageid: <1631805944.33.0.468389984989.issue42969@roundup.psfhosted.org>
2021-09-16 15:25:44jbmslinkissue42969 messages
2021-09-16 15:25:43jbmscreate