classification
Title: _thread module: Remove redundant PyThread_exit_thread() call to avoid glibc fatal error: libgcc_s.so.1 must be installed for pthread_cancel to work
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: erlendaasland, miss-islington, vstinner
Priority: normal Keywords: patch

Created on 2021-06-16 14:19 by vstinner, last changed 2021-07-08 09:25 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
pthread_cancel_bug.py vstinner, 2021-06-16 14:19
pthread_cancel_emfile.py vstinner, 2021-06-16 14:19
Pull Requests
URL Status Linked Edit
PR 26758 merged vstinner, 2021-06-16 14:54
PR 26824 merged miss-islington, 2021-06-21 11:16
PR 26825 merged vstinner, 2021-06-21 11:18
PR 26943 merged vstinner, 2021-06-28 23:24
Messages (18)
msg395924 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-16 14:19
The glibc pthread_exit() functions loads an unwind function from libgcc_s.so.1 using dlopen(). dlopen() can fail to open libgcc_s.so.1 file to various reasons, but the most likely seems to be that the process is out of available file descriptor (EMFILE error).

If the glibc pthread_exit() fails to open libgcc_s.so.1, it aborts the process. Extract of pthread_cancel():

  /* Trigger an error if libgcc_s cannot be loaded.  */
  {
    struct unwind_link *unwind_link = __libc_unwind_link_get ();
    if (unwind_link == NULL)
      __libc_fatal (LIBGCC_S_SO
		    " must be installed for pthread_cancel to work\n");
  }

Sometimes, libgcc_s.so.1 library is loaded early in Python startup. Sometimes, it only loaded when the first Python thread exits.

Hitting in a multithreaded real world application, dlopen() failing with EMFILE is not deterministic. It depends on precise timing and in which order threads are running. It is unlikely in a small application, but it is more likely on a network server which has thousands of open sockets (file descriptors).

--

Attached scripts reproduces the issue. You may need to run the scripts (especially pthread_cancel_emfile.py) multiple times to trigger the issue. Sometimes libgcc_s library is loaded early for an unknown reason, it works around the issue.

(1) pthread_cancel_bug.py 

$ python3.10 pthread_cancel_bug.py 
libgcc_s.so.1 must be installed for pthread_cancel to work
Abandon (core dumped)


(2) pthread_cancel_emfile.py:

$ python3.10 ~/pthread_cancel_emfile.py 
spawn thread
os.open failed: OSError(24, 'Too many open files')
FDs open by the thread: 2 (max FD: 4)
fd 0 valid? True
fd 1 valid? True
fd 2 valid? True
fd 3 valid? True
fd 4 valid? True
libgcc_s.so.1 must be installed for pthread_cancel to work
Abandon (core dumped)

--

Example of real world issue on RHEL8:
https://bugzilla.redhat.com/show_bug.cgi?id=1972293

The RHEL reproducer uses a very low RLIMIT_NOFILE (5 file descriptors) to trigger the bug faster. It simulates a busy server application.

--

There are different options:

(*) Modify thread_run() of Modules/_threadmodule.c to remove the *redundant* PyThread_exit_thread() call.

This is the most simple option and it sounds perfectly safe to me. I'm not sure why PyThread_exit_thread() is called explicitly. We don't pass any parameter to the function.


(*) Link the Python _thread extension on libgcc_s.so if Python it built with the glibc.

Checking if Python is linked to the glibc is non trivial and we have hardcode the "libgcc_s" library name. I expect painful maintenance burden with this option.

(*) Load explicitly the libgcc_s.so library in _thread.start_new_thread(): when the first thread is created.

We need to detect that we are running the glibc at runtime, by calling confstr('CS_GNU_LIBC_VERSION') for example. The problem is that "libgcc_s.so.1" filename may change depending on the Linux distribution. It will likely have a different filename on macOS (".dynlib"). In short, it's tricky to get it right.

(*) Fix the glibc!

I discussed with glibc developers who explained me that there are good reasons to keep the unwind code in the compiler (GCC), and so load it dynamically in the glibc. In short, this is not going to change.

--

Attached PR implements the most straightforward option: remove the redundant PyThread_exit_thread() call in thread_run().
msg395925 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-16 14:27
See also bpo-18748 "io.IOBase destructor silence I/O error on close() by default" which was caused by a bug in an application, the application closed the libgcc_s file descriptor by mistake. It closed the same file decriptor twice, whereas the FD was reused by dlopen() in the meanwhile. But the result was the same, the process aborted with this error message:

"libgcc_s.so.1 must be installed for pthread_cancel to work"
msg395930 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-16 15:11
PyThread_exit_thread() was modified in 2011 to fix daemon threads:

commit 0d5e52d3469a310001afe50689f77ddba6d554d1
Author: Antoine Pitrou <solipsis@pitrou.net>
Date:   Wed May 4 20:02:30 2011 +0200

    Issue #1856: Avoid crashes and lockups when daemon threads run while the
    interpreter is shutting down; instead, these threads are now killed when
    they try to take the GIL.

 PyThread_exit_thread(void)
 {
     dprintf(("PyThread_exit_thread called\n"));
-    if (!initialized) {
+    if (!initialized)
         exit(0);
-    }
+    pthread_exit(0);
 }


This change remains important for Python/ceval.c. When a daemon thread tries to acquire the GIL, it calls PyThread_exit_thread() if Python already exited to exit immediately the thread. Example from take_gil():

    if (tstate_must_exit(tstate)) {
        /* bpo-39877: If Py_Finalize() has been called and tstate is not the
           thread which called Py_Finalize(), exit immediately the thread.

           This code path can be reached by a daemon thread after Py_Finalize()
           completes. In this case, tstate is a dangling pointer: points to
           PyThreadState freed memory. */
        PyThread_exit_thread();
    }

See also my articles on daemon threads fixes:

* https://vstinner.github.io/gil-bugfixes-daemon-threads-python39.html
* https://vstinner.github.io/daemon-threads-python-finalization-python32.html
msg395931 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-16 15:14
_thread.start_new_thread() always called "exit thread", since the function was added to Python:

commit 1984f1e1c6306d4e8073c28d2395638f80ea509b
Author: Guido van Rossum <guido@python.org>
Date:   Tue Aug 4 12:41:02 1992 +0000

    * Makefile adapted to changes below.
    * split pythonmain.c in two: most stuff goes to pythonrun.c, in the library.
    * new optional built-in threadmodule.c, build upon Sjoerd's thread.{c,h}.
    * new module from Sjoerd: mmmodule.c (dynamically loaded).
    * new module from Sjoerd: sv (svgen.py, svmodule.c.proto).
    * new files thread.{c,h} (from Sjoerd).
    * new xxmodule.c (example only).
    * myselect.h: bzero -> memset
    * select.c: bzero -> memset; removed global variable

static void
t_bootstrap(args_raw)
        void *args_raw;
{
        object *args = (object *) args_raw;
        object *func, *arg, *res;

        restore_thread((void *)NULL);
        func = gettupleitem(args, 0);
        arg = gettupleitem(args, 1);
        res = call_object(func, arg);
        DECREF(arg); /* Matches the INCREF(arg) in thread_start_new_thread */
        if (res == NULL) {
                fprintf(stderr, "Unhandled exception in thread:\n");
                print_error(); /* From pythonmain.c */
                fprintf(stderr, "Exiting the entire program\n");
                goaway(1);
        }
        (void) save_thread();
        exit_thread();
}

exit_thread() was partially replaced with PyThread_exit_thread() in:

commit bcc207484a0f8f27a684e11194e7430c0710f66d
Author: Guido van Rossum <guido@python.org>
Date:   Tue Aug 4 22:53:56 1998 +0000

    Changes for BeOS, QNX and long long, by Chris Herborth.
msg395937 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-16 15:37
Unix pthread_create() manual page.
https://man7.org/linux/man-pages/man3/pthread_create.3.html

The new thread terminates in one of the following ways:

(...)
* It  returns  from start_routine().  This is equivalent to calling pthread_exit(3) with the value supplied in
         the return statement.
(...)

Calling pthread_exit(0) is optional.

--


MSDN _beginthreadex() documentation:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex?view=msvc-160

"When the thread returns from that routine, it is terminated automatically."

Calling _endthreadex(0) is optional.
msg395941 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-16 15:44
See also bpo-44436 "[Windows] _thread.start_new_thread() should close the thread handle".
msg396229 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-21 11:16
New changeset 45a78f906d2d5fe5381d78466b11763fc56d57ba by Victor Stinner in branch 'main':
bpo-44434: Don't call PyThread_exit_thread() explicitly (GH-26758)
https://github.com/python/cpython/commit/45a78f906d2d5fe5381d78466b11763fc56d57ba
msg396236 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-21 12:23
New changeset 83ad40efc3e299d1e94692d958111a63c2fd6775 by Victor Stinner in branch '3.9':
bpo-44434: Don't call PyThread_exit_thread() explicitly (GH-26758) (GH-26825)
https://github.com/python/cpython/commit/83ad40efc3e299d1e94692d958111a63c2fd6775
msg396239 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-21 12:29
New changeset 6614eac843c5dc0f4c2664ef610b81e556e44923 by Miss Islington (bot) in branch '3.10':
bpo-44434: Don't call PyThread_exit_thread() explicitly (GH-26758) (GH-26824)
https://github.com/python/cpython/commit/6614eac843c5dc0f4c2664ef610b81e556e44923
msg396240 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-21 12:29
Ok, the issue is now fixed in 3.9, 3.10 and main branches.
msg396672 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-28 23:31
I marked bpo-42888 as a duplicate of this issue.

I created PR 26943 based on Alexey's PR 24241 to complete my fix (remove two calls in two tests). 

Copy of his interesting PR commit message:
---
bpo-42888: Remove PyThread_exit_thread() calls from top-level thread functions

PyThread_exit_thread() uses pthread_exit() on POSIX systems. In glibc,
pthread_exit() is implemented in terms of pthread_cancel(), requiring
the stack unwinder implemented in libgcc. Further, in dynamically
linked applications, calls of pthread_exit() in source code do not
make libgcc_s.so a startup dependency: instead, it's lazily loaded
by glibc via dlopen() when pthread_exit() is called the first time[1].
All of this makes otherwise finely working CPython fail in multithreaded
applications on thread exit if dlopen() fails for any reason.

While providing libgcc_s.so is the reponsibility of the user
(or their package manager), this hidden dependency has been
the source of countless frustrations(e.g. [2]) and, further,
dlopen() may fail for other reasons([3]). But most calls
to PyThread_exit_thread() in CPython are useless because they're done
from the top-level thread function and hence are equivalent
to simply returning. So remove all such calls, thereby avoiding
the glibc cancellation machinery.

The only exception are calls in take_gil() (Python/ceval_gil.h)
which serve as a safety net for daemon threads attempting
to acquire the GIL after Py_Finalize(). Unless a better model for
daemon threads is devised or support for them is removed,
those calls have to be preserved since we need to terminate
the thread right now without touching any interpreter state.

Of course, since PyThread_exit_thread() is a public API,
any extension module can still call it and trip over the same issue.

[1] https://sourceware.org/legacy-ml/libc-help/2014-07/msg00000.html
[2] https://stackoverflow.com/questions/64797838/libgcc-s-so-1-must-be-installed-for-pthread-cancel-to-work
[3] https://www.sourceware.org/bugzilla/show_bug.cgi?id=13119
---
msg396676 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-28 23:45
Se also bpo-35866 which looks like a duplicate.
msg396678 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-28 23:47
I marked bpo-37395 "Core interpreter should be linked with libgcc_s.so on Linux" as a duplicate of this issue.
msg396682 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-29 00:03
New changeset 48e3a1d95aee013974121fcafe19816c0e9a41da by Victor Stinner in branch 'main':
bpo-44434: Remove useless calls to PyThread_exit_thread() (GH-26943)
https://github.com/python/cpython/commit/48e3a1d95aee013974121fcafe19816c0e9a41da
msg396683 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-29 00:07
On Linux, there is a workaround for Python versions which don't include this fix:

$ LD_PRELOAD=/usr/lib64/libgcc_s.so.1 python3 ...

To preload the libgcc_s.so.1 library in the Python process when running Python.
msg396928 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-07-03 21:52
Good news: this change fixed bpo-35866 "concurrent.futures deadlock".
msg397129 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-07-08 09:20
I started "Does anyone use threading debug PYTHONTHREADDEBUG=1 env var? Can I remove it?" thread on python-dev:
https://mail.python.org/archives/list/python-dev@python.org/thread/NMLGCDRUKLZSTK4UICJTKR54WRXU2ZGJ/
msg397132 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-07-08 09:25
I created bpo-44584: "Deprecate thread debugging PYTHONTHREADDEBUG=1".
History
Date User Action Args
2021-07-08 09:25:04vstinnersetmessages: + msg397132
2021-07-08 09:20:09vstinnersetmessages: + msg397129
2021-07-03 21:52:49vstinnersetmessages: + msg396928
2021-06-29 00:07:41vstinnersetmessages: + msg396683
2021-06-29 00:03:42vstinnersetmessages: + msg396682
2021-06-28 23:47:22vstinnersetmessages: + msg396678
2021-06-28 23:47:01vstinnerlinkissue37395 superseder
2021-06-28 23:45:43vstinnersetmessages: + msg396676
2021-06-28 23:31:18vstinnersetmessages: + msg396672
2021-06-28 23:29:24vstinnerlinkissue42888 superseder
2021-06-28 23:24:13vstinnersetpull_requests: + pull_request25510
2021-06-21 12:29:46vstinnersetstatus: open -> closed
versions: + Python 3.9, Python 3.10
messages: + msg396240

resolution: fixed
stage: patch review -> resolved
2021-06-21 12:29:20vstinnersetmessages: + msg396239
2021-06-21 12:23:07vstinnersetmessages: + msg396236
2021-06-21 11:18:23vstinnersetpull_requests: + pull_request25406
2021-06-21 11:16:27miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request25405
2021-06-21 11:16:21vstinnersetmessages: + msg396229
2021-06-16 15:55:46erlendaaslandsetnosy: + erlendaasland
2021-06-16 15:44:14vstinnersetmessages: + msg395941
2021-06-16 15:37:59vstinnersetmessages: + msg395937
2021-06-16 15:14:52vstinnersetmessages: + msg395931
2021-06-16 15:11:08vstinnersetmessages: + msg395930
2021-06-16 14:54:14vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request25343
2021-06-16 14:27:04vstinnersetmessages: + msg395925
2021-06-16 14:19:29vstinnersetfiles: + pthread_cancel_emfile.py
2021-06-16 14:19:11vstinnercreate