msg415212 - (view) |
Author: (hydroflask) |
Date: 2022-03-15 00:18 |
_destructor in connection.c in Python 3.10+ now calls `PyGILState_Ensure()`, this is a problem because if the destructor is being called while the thread is being torn down it will cause an unbalanced/erroneous call to "PyEval_RestoreThread" in PyGILState_Ensure which will eventually trigger a Fatal Python Error. A perfect repro has been attached, should be run on Linux.
My recommended fix is to call sqlite3_close() within Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS, and manually Py_DECREF all connection-related functions afterward.
|
msg415216 - (view) |
Author: (hydroflask) |
Date: 2022-03-15 01:37 |
If you connect to the sqlite3 database in the example without using the factory, it will simply segfault.
|
msg415227 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2022-03-15 08:58 |
I'm removing 3.11 from the versions field:
$ python3.11 sqlite3_fatal_python_error.py
Exception ignored in: <function deleting_conn.__del__ at 0x10af55580>
Traceback (most recent call last):
File "/Users/erlendaasland/src/cpython-build/sqlite3_fatal_python_error.py", line 10, in __del__
self.close()
^^^^^^^^^^^^
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 123145443913728 and this is thread id 4506211840.
|
msg415230 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2022-03-15 10:08 |
> A perfect repro has been attached, should be run on Linux.
FTR, the repo works on macOS 12.2.1 as well:
$ python3.10 ./sqlite3_fatal_python_error.py
Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: initialized
Thread 0x00000001126e7600 (most recent call first):
File "/Users/erlendaasland/src/cpython-build/./sqlite3_fatal_python_error.py", line 52 in <module>
zsh: abort python3.10 ./sqlite3_fatal_python_error.py
|
msg415231 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2022-03-15 10:12 |
> My recommended fix is to call sqlite3_close() within Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS, and manually Py_DECREF all connection-related functions afterward.
3.11 already drops the GIL before sqlite3_close_v2(); backporting this change to 3.10 results in the following assertion error:
$ ./python.exe sqlite3_fatal_python_error.py
Assertion failed: (!_PyMem_IsPtrFreed(tstate->interp)), function is_tstate_valid, file ceval.c, line 155.
|
msg415277 - (view) |
Author: (hydroflask) |
Date: 2022-03-15 19:10 |
If you connect with check_same_thread=False, I believe the issue may still present itself on 3.11+
|
msg415278 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2022-03-15 19:12 |
> If you connect with check_same_thread=False, I believe the issue may still present itself on 3.11+
Nope, works swell on my Mac. Do you see this on Linux with latest 3.11?
|
msg415279 - (view) |
Author: (hydroflask) |
Date: 2022-03-15 19:25 |
I don't see it immediately but I think it's still possible to happen since all the same offending code is in place. There are two reasosn why it probably doesn't happen in 3.11+:
1) because something is deferring calling the finalizer for the zero-ref object to another thread instead on immediately on the thread that is dying as in <=3.10
2) In 3.11+ it is fixed so that PyGILState_Ensure can be called in a sequence started by PyGILState_Release. @vstinner is this correct?
If the reason it doesn't happen in 3.11+ is because of 1) then I don't think that is specified to happen anywhere and changing the GC in future versions could theoretically trigger the same bug. If it is 2) then it is fixed in a more robust location.
IMO, the most robust fix is to destroy the function callbacks in connection_close() and avoid using the destructor_callback and calling PyGILState_Ensure() altogether.
|
msg415280 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2022-03-15 19:29 |
All callbacks triggered from external libraries must protect Python C API calls. You may call it offending, but that is the reality; a callback may trigger at any point in time, so we need to make sure the GIL is held before calling any Py API. That is also the case for the destructor.
|
msg415282 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2022-03-15 19:32 |
See also bpo-44304
|
msg415283 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2022-03-15 19:34 |
See also Petrs msg400205:
> [...] I recommend that any function passed to SQLite (and only those) should
> - be named `*_callback`, for clarity
> - acquire the GIL at the very start
> - release the GIL at the very end
|
msg415284 - (view) |
Author: (hydroflask) |
Date: 2022-03-15 19:36 |
> All callbacks triggered from external libraries must protect Python C API calls. You may call it offending, but that is the reality; a callback may trigger at any point in time, so we need to make sure the GIL is held before calling any Py API. That is also the case for the destructor.
Sure but I'm suggesting sidestepping and not using the destructor_callback entirely. You can DECREF the callbacks manually in connection_close() after calling sqlite3_close_v2(). Doing this makes sense to me since the destructor may be called under special conditions.
|
msg415285 - (view) |
Author: (hydroflask) |
Date: 2022-03-15 19:43 |
If PyGILState_Ensure() has been fixed to become re-entrant during PyGILState_Release() in 3.11+ then I believe that change should be backported to 3.10. @vstinner would know
|
msg415287 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2022-03-15 19:49 |
FYI, PyGILState_Ensure may be called by a thread as many times as you want, and it has been so for a very long time.
|
msg415288 - (view) |
Author: (hydroflask) |
Date: 2022-03-15 19:54 |
So what is causing this bug in 3.10 is the following stack:
thread_entry_point()
PyGILState_Ensure()
PyGILState_Release()
...
PyGILState_Ensure()
The core issue is that PyGILState_Ensure is getting called while the tstate itself is in the process of being destroyed. This causes an invalid state and eventually results in a segfault or "Fatal Python Error."
I think the most robust fix is to allow re-entrancy to PyGILState_Release/PyGILState_Ensure. If that is prohibitively complex and/or is not specified to be allowed, I believe the most robust fix is to avoid using sqlite3 destructor callbacks to DECREF Python objects.
|
msg415294 - (view) |
Author: Erlend E. Aasland (erlendaasland) * |
Date: 2022-03-15 20:42 |
Maybe related: bpo-20891
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:57 | admin | set | github: 91175 |
2022-03-15 20:42:59 | erlendaasland | set | messages:
+ msg415294 |
2022-03-15 19:54:57 | hydroflask | set | messages:
+ msg415288 |
2022-03-15 19:49:20 | erlendaasland | set | messages:
+ msg415287 |
2022-03-15 19:43:31 | hydroflask | set | messages:
+ msg415285 |
2022-03-15 19:36:21 | hydroflask | set | messages:
+ msg415284 |
2022-03-15 19:34:36 | erlendaasland | set | messages:
+ msg415283 |
2022-03-15 19:32:13 | erlendaasland | set | nosy:
+ pablogsal messages:
+ msg415282
|
2022-03-15 19:29:14 | erlendaasland | set | messages:
+ msg415280 |
2022-03-15 19:25:46 | hydroflask | set | nosy:
+ vstinner messages:
+ msg415279
|
2022-03-15 19:12:24 | erlendaasland | set | messages:
+ msg415278 |
2022-03-15 19:10:34 | hydroflask | set | messages:
+ msg415277 |
2022-03-15 10:12:30 | erlendaasland | set | messages:
+ msg415231 |
2022-03-15 10:08:40 | erlendaasland | set | messages:
+ msg415230 |
2022-03-15 08:58:19 | erlendaasland | set | messages:
+ msg415227 versions:
- Python 3.11 |
2022-03-15 01:37:46 | hydroflask | set | messages:
+ msg415216 |
2022-03-15 00:18:22 | hydroflask | create | |