PyGILState_Ensure on non-Python thread causes fatal error #65090

zooba · 2014-03-11T18:40:13Z

BPO	20891
Nosy	@gvanrossum, @warsaw, @ncoghlan, @pitrou, @vstinner, @ericsnowcurrently, @zooba
PRs	bpo-20891: Fix PyGILState_Ensure() #4650 [3.6] bpo-20891: Fix PyGILState_Ensure() (#4650) #4655 [2.7] bpo-20891: Fix PyGILState_Ensure() (#4650) #4657 bpo-20891: Py_Initialize() now creates the GIL #4700 bpo-20891: Skip test_embed.test_bpo20891() #4967 [3.6] bpo-20891: Skip test_embed.test_bpo20891() (#4967) #4969 bpo-20891: Reenable test_embed.test_bpo20891() #5420 [3.6] bpo-20891: Py_Initialize() now creates the GIL (#4700) #5421 [3.6] bpo-20891: Remove test_capi.test_bpo20891() #5425
Files	test.c ptest.c PyGILState_Ensure.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2018-01-29.13:07:27.074>
created_at = <Date 2014-03-11.18:40:13.148>
labels = ['extension-modules', '3.7']
title = 'PyGILState_Ensure on non-Python thread causes fatal error'
updated_at = <Date 2018-01-29.15:35:52.778>
user = 'https://github.com/zooba'

bugs.python.org fields:

activity = <Date 2018-01-29.15:35:52.778>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2018-01-29.13:07:27.074>
closer = 'vstinner'
components = ['Extension Modules']
creation = <Date 2014-03-11.18:40:13.148>
creator = 'steve.dower'
dependencies = []
files = ['34358', '42173', '42174']
hgrepos = []
issue_num = 20891
keywords = ['patch']
message_count = 31.0
messages = ['213160', '213161', '213398', '261820', '261821', '307038', '307330', '307331', '307342', '307349', '307350', '307351', '307550', '307551', '307552', '307553', '307554', '307571', '307583', '307586', '307628', '308588', '308914', '308917', '311090', '311091', '311095', '311099', '311124', '311125', '311143']
nosy_count = 10.0
nosy_names = ['gvanrossum', 'barry', 'ncoghlan', 'pitrou', 'vstinner', 'Mekk', 'kchen', 'eric.snow', 'steve.dower', 'xcombelle']
pr_nums = ['4650', '4655', '4657', '4700', '4967', '4969', '5420', '5421', '5425']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue20891'
versions = ['Python 2.7', 'Python 3.6', 'Python 3.7']

zooba · 2014-03-11T18:40:12Z

In Python 3.4rc3, calling PyGILState_Ensure() from a thread that was not created by Python and without any calls to PyEval_InitThreads() will cause a fatal exit:

Fatal Python error: take_gil: NULL tstate

I believe this was added in http://hg.python.org/cpython/rev/dc4e805ec68a. The attached file has a minimal repro (for Windows, sorry, but someone familiar with low-level *nix threading APIs should be able to translate it easily).

Basically, the sequence looks like this:

main() calls Py_Initialize()
main() starts new C-level thread
main() calls Py_BEGIN_ALLOW_THREADS and waits for the new thread
New thread calls PyGILState_Ensure()
PyGILState_Ensure() sees that there is no thread state and calls PyEval_InitThreads() (this is what changed in the linked changeset)
PyEval_InitThreads() creates the GIL
PyEval_InitThreads() calls take_gil(PyThreadState_GET()) (this is what aborts - there is still no thread state)
(expected) PyGILState_Ensure() calls PyThreadState_New() and PyEval_RestoreThread().

The attached repro has two workarounds (commented out). The first is to call PyThreadState_New()/PyEval_RestoreThread() from the new thread, and the second is to call PyEval_InitThreads() from the main thread.

The latter is inappropriate for my use, since I am writing a debugger that can inject itself into a running CPython process - there is no way to require that PyEval_InitThreads() has been called and I can't reliably inject it onto the main thread. The first workaround is okay, but I'd rather not have to special-case 3.4 by rewriting PyGILState_Ensure() so that I can safely call PyGILState_Ensure().

zooba · 2014-03-11T18:49:59Z

Should have linked to bpo-19576 as well, which is the issue associated with that changeset.

vstinner · 2014-03-13T13:48:45Z

IMO it's a bug in PyEval_InitThreads().

vstinner · 2016-03-15T16:55:58Z

ptest.c: Portable example using Python API.

vstinner · 2016-03-15T16:56:16Z

Attached patch should fix the issue.

Mekk · 2017-11-27T08:56:13Z

Is this fix released? I can't find it in the changelog…

(I faced this bug on 3.5.2, released a couple of months after this bug was closed…)

vstinner · 2017-11-30T17:20:06Z

Is this fix released? I can't find it in the changelog…

Oops, I lost track of this issue, but it wasn't fixed, no.

I just proposed my old fix as a pull requets: PR 4650.

(I faced this bug on 3.5.2, released a couple of months after this bug was closed…)

The workaround is to call PyEval_InitThreads() before spawning your first non-Python thread.
https://docs.python.org/dev/c-api/init.html#c.PyEval_InitThreads

vstinner · 2017-11-30T17:20:45Z

I have to check if Python 2.7 is impacted as well.

vstinner · 2017-11-30T21:05:03Z

New changeset b4d1e1f by Victor Stinner in branch 'master':
bpo-20891: Fix PyGILState_Ensure() (bpo-4650)
b4d1e1f

vstinner · 2017-11-30T22:35:17Z

New changeset be6b74c by Victor Stinner in branch '2.7':
bpo-20891: Fix PyGILState_Ensure() (bpo-4650) (bpo-4657)
be6b74c

vstinner · 2017-11-30T22:36:51Z

New changeset e10c9de by Victor Stinner in branch '3.6':
bpo-20891: Fix PyGILState_Ensure() (bpo-4650) (bpo-4655)
e10c9de

vstinner · 2017-11-30T22:46:58Z

Ok, the bug is now fixed in Python 2.7, 3.6 and master (future 3.7). On 3.6 and master, the fix comes with an unit test.

Thanks Steve Dower for the bug report, sorry for the long delay, I completely forgot this old bug!

Marcin Kasperski: Thanks for the reminder. Sadly, we will have to wait for the next release to get the fix. In the meanwhile, you can workaround the bug by calling PyEval_InitThreads() after Py_Initialize() but before running actual code.

vstinner · 2017-12-04T11:51:47Z

"./Program/_testembed.exe bpo-20891" fails randomly on macOS:
---
macbook:master haypo$ while true; do ./Programs/_testembed bpo-20891 ||break; date; done
Lun 4 déc 2017 12:46:34 CET
Lun 4 déc 2017 12:46:34 CET
Lun 4 déc 2017 12:46:34 CET
Fatal Python error: PyEval_SaveThread: NULL tstate

Current thread 0x00007fffa5dff3c0 (most recent call first):
Abort trap: 6
---

In test_bpo20891() of Program/_testembed.c, Py_BEGIN_ALLOW_THREADS calls PyEval_SaveThread() which fails with a fatal error:

    PyThreadState *tstate = PyThreadState_Swap(NULL);
    if (tstate == NULL)
        Py_FatalError("PyEval_SaveThread: NULL tstate"); <~~~ HERE

I'm not sure that it's safe to create the GIL in PyGILState_Ensure() in a "non-Python" thread, while the main thread (which is a Python thread) "is running".

I found a working fix: call PyEval_InitThreads() in PyThread_start_new_thread(). So the GIL is created as soon as a second thread is spawned. The GIL cannot be created anymore while two threads are running. At least, with the "python" binary. It doesn't fix the issue if a thread is not spawned by Python, but this thread calls PyGILState_Ensure().

Maybe we need to better document how threads have to be initialized to prevent this race condition / bug?

pitrou · 2017-12-04T11:52:38Z

Why not *always* call PyEval_InitThreads() at interpreter initialization? Are there any downsides?

vstinner · 2017-12-04T11:59:07Z

Why not *always* call PyEval_InitThreads() at interpreter initialization? Are there any downsides?

The code is quite old:

commit 1984f1e
Author: Guido van Rossum <guido@python.org>
Date: Tue Aug 4 12:41:02 1992 +0000

* Makefile adapted to changes below.
* split pythonmain.c in two: most stuff goes to pythonrun.c, in the library.
* new optional built-in threadmodule.c, build upon Sjoerd's thread.{c,h}.
* new module from Sjoerd: mmmodule.c (dynamically loaded).
* new module from Sjoerd: sv (svgen.py, svmodule.c.proto).
* new files thread.{c,h} (from Sjoerd).
* new xxmodule.c (example only).
* myselect.h: bzero -> memset
* select.c: bzero -> memset; removed global variable

(...)

+void
+init_save_thread()
+{
+#ifdef USE_THREAD
+ if (interpreter_lock)
+ fatal("2nd call to init_save_thread");
+ interpreter_lock = allocate_lock();
+ acquire_lock(interpreter_lock, 1);
+#endif
+}
+#endif

Current version of the code:

void
PyEval_InitThreads(void)
{
    if (gil_created())
        return;
    create_gil();
    take_gil(PyThreadState_GET());
    _PyRuntime.ceval.pending.main_thread = PyThread_get_thread_ident();
    if (!_PyRuntime.ceval.pending.lock)
        _PyRuntime.ceval.pending.lock = PyThread_allocate_lock();
}

vstinner · 2017-12-04T12:05:04Z

The "if (gil_created())" check can be found in the Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS macros which call PyEval_SaveThread() / PyEval_RestoreThread():

PyThreadState *
PyEval_SaveThread(void)
{
    PyThreadState *tstate = PyThreadState_Swap(NULL);
    if (tstate == NULL)
        Py_FatalError("PyEval_SaveThread: NULL tstate");
    if (gil_created()) <~~~~ HERE
        drop_gil(tstate);
    return tstate;
}

void
PyEval_RestoreThread(PyThreadState *tstate)
{
    if (tstate == NULL)
        Py_FatalError("PyEval_RestoreThread: NULL tstate");
    if (gil_created()) { <~~~~ HERE
        int err = errno;
        take_gil(tstate);
        /* _Py_Finalizing is protected by the GIL */
        if (_Py_IsFinalizing() && !_Py_CURRENTLY_FINALIZING(tstate)) {
            drop_gil(tstate);
            PyThread_exit_thread();
            Py_UNREACHABLE();
        }
        errno = err;
    }
    PyThreadState_Swap(tstate);
}

I guess that the intent of dynamically created GIL is to reduce the "overhead" of the GIL when 100% of the code is run in single thread.

I understand that this potential overhead must be measured to decide if it's ok to always create the GIL.

This bug was reported once in 2014, then Marcin Kasperski asked for an update last month: 3 years later. It's not a common bug impacting many users. So if there is an overhead, I'm not sure that it's a good idea to change Python to fix this rare use case.

pitrou · 2017-12-04T12:14:32Z

I guess that the intent of dynamically created GIL is to reduce the "overhead" of the GIL when 100% of the code is run in single thread.

I'm not sure it is ok to guess here. The original code was written in 1992, when system threading APIs were buggy and not always available. Also the original GIL logic released the lock every N opcodes, which is a fragile heuristic. The current GIL shouldn't have any visible performance cost in single-threaded mode, and in even in multi-threaded mode it's been a long time since anyone complained about a real-world performance degradation due to the GIL's overhead.

Perhaps Guido remembers the original rationale?

This bug was reported once in 2014, then Marcin Kasperski asked for an update last month: 3 years later. It's not a common bug impacting many users.

From a maintenance POV, initializing the GIL eagerly also makes maintenance easier for us, instead of making all those dynamic checks.

gvanrossum · 2017-12-04T15:38:01Z

Yeah, the original reasoning was that threads were something esoteric and
not used by most code, and at the time we definitely felt that always using
the GIL would cause a (tiny) slowdown and increase the risk of crashes due
to bugs in the GIL code. I'd be happy to learn that we no longer need to
worry about this and can just always initialize it.

(Note: I haven't read the entire thread, just the first and last message.)

vstinner · 2017-12-04T17:36:30Z

I wrote the PR 4700 to create the GIL in Py_Initialize().

Would you be ok to backport this fix to Python 2.7 and 3.6 as well?

gvanrossum · 2017-12-04T17:53:16Z

For 2.7 I hesitate to OK this, who knows what skeletons are still in that
closet. It doesn't sound like a security fix to me. For 3.6 I'm fine with
it as a bugfix.

ncoghlan · 2017-12-05T03:39:53Z

+1 for making this change 3.6+ only.

Victor, could you run your patch through the performance benchmarks? While I suspect Antoine is right that our current GIL management heuristics will mean we don't need the lazy initialisation optimisation any more, it's still preferable to have the specific numbers before merging it.

vstinner · 2017-12-18T22:06:51Z

I ran pyperformance on my PR 4700. Differences of at least 5%:

haypo@speed-python$ python3 -m perf compare_to ~/json/uploaded/2017-12-18_12-29-master-bd6ec4d79e85.json.gz /home/haypo/json/patch/2017-12-18_12-29-master-bd6ec4d79e85-patch-4700.json.gz --table --min-speed=5

+----------------------+--------------------------------------+-------------------------------------------------+
| Benchmark | 2017-12-18_12-29-master-bd6ec4d79e85 | 2017-12-18_12-29-master-bd6ec4d79e85-patch-4700 |
+======================+======================================+=================================================+
| pathlib | 41.8 ms | 44.3 ms: 1.06x slower (+6%) |
+----------------------+--------------------------------------+-------------------------------------------------+
| scimark_monte_carlo | 197 ms | 210 ms: 1.07x slower (+7%) |
+----------------------+--------------------------------------+-------------------------------------------------+
| spectral_norm | 243 ms | 269 ms: 1.11x slower (+11%) |
+----------------------+--------------------------------------+-------------------------------------------------+
| sqlite_synth | 7.30 us | 8.13 us: 1.11x slower (+11%) |
+----------------------+--------------------------------------+-------------------------------------------------+
| unpickle_pure_python | 707 us | 796 us: 1.13x slower (+13%) |
+----------------------+--------------------------------------+-------------------------------------------------+

Not significant (55): 2to3; chameleon; chaos; (...)

vstinner · 2017-12-21T23:05:07Z

New changeset 550ee05 by Victor Stinner in branch 'master':
bpo-20891: Skip test_embed.test_bpo20891() (bpo-4967)
550ee05

vstinner · 2017-12-21T23:32:30Z

New changeset 2e1ef00 by Victor Stinner in branch '3.6':
bpo-20891: Skip test_embed.test_bpo20891() (bpo-4967) (bpo-4969)
2e1ef00

vstinner · 2018-01-29T10:57:21Z

I ran pyperformance on my PR 4700. Differences of at least 5%: (...)

I tested again these 5 benchmarks were Python was slower with my PR. I ran these benchmarks manually on my laptop using CPU isolation. Result:

vstinner@apu$ python3 -m perf compare_to ref.json patch.json --table
Not significant (5): unpickle_pure_python; sqlite_synth; spectral_norm; pathlib; scimark_monte_carlo

Ok, that was expected: no significant difference.

vstinner · 2018-01-29T10:57:47Z

New changeset 2914bb3 by Victor Stinner in branch 'master':
bpo-20891: Py_Initialize() now creates the GIL (bpo-4700)
2914bb3

vstinner · 2018-01-29T11:24:30Z

I proposed PR 5421 to fix Python 3.6: modify Py_Initialize() to call PyEval_InitThreads().

I'm not sure of Python 2.7. Not only the backport is not straighforward, but I'm not sure that I want to touch the super-stable 2.7 branch. Maybe for 2.7, I should just remove the currently skipped test_bpo20891() test from test_capi.

What do you think?

vstinner · 2018-01-29T11:42:54Z

Antoine Pitrou considers that my PR 5421 for Python 3.6 should not be merged:
"@vstinner I don't think so. People can already call PyEval_InitThreads."
#5421 (comment)

vstinner · 2018-01-29T13:04:46Z

New changeset d951157 by Victor Stinner in branch 'master':
bpo-20891: Reenable test_embed.test_bpo20891() (GH-5420)
d951157

vstinner · 2018-01-29T13:07:27Z

Antoine Pitrou:

@vstinner I don't think so. People can already call PyEval_InitThreads.

Since only two users complained about https://bugs.python.org/issue20891 in 3 years, I agree that it's ok to not fix Python 2.7 and 3.6. The workaround is to call PyEval_InitThreads() before starting the first thread.

I wasn't excited to make such stable in stable 2.7 and 3.6 anyway :-)

test_embed.test_bpo20891() is enabled again on master. I now close the issue.

Thanks Steve Dower for the bug report!

vstinner · 2018-01-29T15:35:53Z

New changeset 0cecc22 by Victor Stinner in branch '3.6':
bpo-20891: Remove test_capi.test_bpo20891() (bpo-5425)
0cecc22

zooba added the extension-modules C modules in the Modules dir label Mar 11, 2014

vstinner added the 3.7 (EOL) end of life label Nov 30, 2017

vstinner closed this as completed Nov 30, 2017

vstinner reopened this Dec 4, 2017

vstinner closed this as completed Jan 29, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

erlend-aasland mentioned this issue Apr 10, 2022

Fatal Python Error in sqlite3 Python 3.10 #91175

Closed

ericsnowcurrently mentioned this issue Jul 21, 2022

gh-90815: Fix test_embed for Windows PGO build with mimalloc #94790

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyGILState_Ensure on non-Python thread causes fatal error #65090

PyGILState_Ensure on non-Python thread causes fatal error #65090

zooba commented Mar 11, 2014

zooba commented Mar 11, 2014

zooba commented Mar 11, 2014

vstinner commented Mar 13, 2014

vstinner commented Mar 15, 2016

vstinner commented Mar 15, 2016

Mekk mannequin commented Nov 27, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Dec 4, 2017

pitrou commented Dec 4, 2017

vstinner commented Dec 4, 2017

vstinner commented Dec 4, 2017

pitrou commented Dec 4, 2017

gvanrossum commented Dec 4, 2017

vstinner commented Dec 4, 2017

gvanrossum commented Dec 4, 2017

ncoghlan commented Dec 5, 2017

vstinner commented Dec 18, 2017

vstinner commented Dec 21, 2017

vstinner commented Dec 21, 2017

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

PyGILState_Ensure on non-Python thread causes fatal error #65090

PyGILState_Ensure on non-Python thread causes fatal error #65090

Comments

zooba commented Mar 11, 2014

zooba commented Mar 11, 2014

zooba commented Mar 11, 2014

vstinner commented Mar 13, 2014

vstinner commented Mar 15, 2016

vstinner commented Mar 15, 2016

Mekk mannequin commented Nov 27, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Nov 30, 2017

vstinner commented Dec 4, 2017

pitrou commented Dec 4, 2017

vstinner commented Dec 4, 2017

vstinner commented Dec 4, 2017

pitrou commented Dec 4, 2017

gvanrossum commented Dec 4, 2017

vstinner commented Dec 4, 2017

gvanrossum commented Dec 4, 2017

ncoghlan commented Dec 5, 2017

vstinner commented Dec 18, 2017

vstinner commented Dec 21, 2017

vstinner commented Dec 21, 2017

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018

vstinner commented Jan 29, 2018