This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Deadlock in pysqlite_connection_dealloc()
Type: behavior Stage: resolved
Components: Extension Modules, Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: erlendaasland, hydroflask
Priority: normal Keywords:

Created on 2020-12-21 00:44 by hydroflask, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test.py hydroflask, 2021-07-30 18:03
Messages (13)
msg383471 - (view) Author: (hydroflask) Date: 2020-12-21 00:44
pysqlite_connection_dealloc() calls sqlite3_close{,_v2}(). This can cause a deadlock if sqlite3_close() tries to acquire a lock that another thread  holds, due to a deadlock between the GIL and an internal sqlite3 lock.

This is especially common with sqlite3's "shared cache mode."

Since the GIL should not be released during a tp_dealloc function and python has no control over the behavior of sqlite3_close(), it is incorrect to call sqlite3_close() in pysqlite_connection_dealloc().
msg383472 - (view) Author: (hydroflask) Date: 2020-12-21 01:32
This is also a problem in pysqlite_connection_close() as currently implemented.
msg383482 - (view) Author: (hydroflask) Date: 2020-12-21 02:15
Nevermind it seems that it's legal to call Py_BEGIN_ALLOW_THREADS in tp_dealloc. The fix is then to allow threads around sqlite3_close().
msg383489 - (view) Author: (hydroflask) Date: 2020-12-21 05:26
Another comment: if calling sqlite3_close() outside of GIL, then the associated SQL function destructors must take the GIL before calling Py_DECREF
msg398543 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-07-30 12:23
hydroflask, can you provide a reproducer?
msg398594 - (view) Author: (hydroflask) Date: 2021-07-30 18:03
Unfortunately I am currently unable to write a repro for this bug but I am 100% certain it exists from analyzing the code. In our application (which is a large multithreaded application) after I made the changes consistent with this report the deadlock vanished completely. When the deadlock occurred, the thread stacks were consistent with the information in this report.

To get the deadlock to trigger, SQLite itself must take a lock on sqlite3_close() but I don't know very much about SQLite's internals to consistently provoke that behavior. I've attached my attempt at a repro which is similar enough to the pattern of our application but it was not working.
msg398801 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-02 20:04
Thanks!

I had to comment out the reg_func() call (it is not defined in test.py, and I don't know what it's supposed to do), and change "with lock.shared_context():" to "with lock:", but I'm unable get a deadlock. I've tried with Python 3.8 though 3.11 (dev) and SQLite 3.7.15 through 3.37.0 (dev).

- How long do you normally have to run this in order to force a deadlock?
- Could you attach the output of "python -m sysconfig"?
msg398802 - (view) Author: (hydroflask) Date: 2021-08-02 20:11
You did the right thing in terms of editing the code.

I was not able to get a deadlock with this code which I think should be representative of the error. In production the error was happening after 30 minutes of so.

The major problem is that I don't exactly know how to provoke SQLite to acquire an internal lock. If we assume that it does acquire internal locks and other threads release the GIL before calling into SQLite functions that acquire an internal lock, then you can reason that if sqlite3_close() acquires that lock without first releasing the GIL a deadlock will certainly occur.

This is the deadlock that my application was running into.
msg398803 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-02 20:24
> In production the error was happening after 30 minutes of so.

Great, thanks.

> The major problem is that I don't exactly know how to provoke SQLite to
> acquire an internal lock.

IIRC, you can provoke the internal SQLite lock simply by using transaction control: BEGIN (lock) => COMMIT / ROLLBACK (unlock).

> If we assume that it does acquire internal locks and other threads release
> the GIL before calling into SQLite functions that acquire an internal lock,
> then you can reason that if sqlite3_close() acquires that lock without first
> releasing the GIL a deadlock will certainly occur.

True. For the record, I'm not doubting the existence of this deadlock, nor the correctness of the proposed solution (allow threads around sqlite3_close()) :)

I'll see if I can come up with a compact repro.
msg398804 - (view) Author: (hydroflask) Date: 2021-08-02 20:29
>> The major problem is that I don't exactly know how to provoke SQLite to
>> acquire an internal lock.

> IIRC, you can provoke the internal SQLite lock simply by using transaction control: BEGIN (lock) => COMMIT / ROLLBACK (unlock).

Ah okay, so the sequence would have to be this:

thread1: pysqlite.some_operation()
thread1: release gil
thread1: sqlite3_some_procedure()
thread1: acquire sqlite lock
<switch threads>
thread2: acquire gil
thread2: pysqlite.close()
thread2: sqlite3_close()
thread2: acquire sqlite lock

> I'll see if I can come up with a compact repro.

It should be possible, good luck!
msg413052 - (view) Author: (hydroflask) Date: 2022-02-11 09:29
Any update on this? I know you wanted to repro but even in the absence of the repro, I think calling sqlite3_close() without releasing the GIL is error-prone. If there is no immediate plan to make this change you may close the issue :)
msg413053 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-02-11 09:35
> Any update on this?

I was unable to reproduce this last time we visited this issue, but I'll give it another try.

> If there is no immediate plan to make this change you may close the issue :)

I do not want to close it until I can a) either prove that it is _not_ an issue, or b) prove that it _is_ an issue _and_ we've got a fix :)
msg415289 - (view) Author: (hydroflask) Date: 2022-03-15 19:58
Closing this issue since sqlite3_close{,_v2}() is no longer called with the GIL held in 3.11+
History
Date User Action Args
2022-04-11 14:59:39adminsetgithub: 86864
2022-03-15 19:58:08hydroflasksetstatus: open -> closed

messages: + msg415289
stage: resolved
2022-02-11 09:35:22erlendaaslandsetmessages: + msg413053
2022-02-11 09:29:59hydroflasksetmessages: + msg413052
2021-08-02 20:29:41hydroflasksetmessages: + msg398804
2021-08-02 20:24:12erlendaaslandsetmessages: + msg398803
2021-08-02 20:11:42hydroflasksetmessages: + msg398802
2021-08-02 20:04:50erlendaaslandsetmessages: + msg398801
2021-07-30 18:03:10hydroflasksetfiles: + test.py

messages: + msg398594
2021-07-30 12:23:20erlendaaslandsetnosy: + erlendaasland

messages: + msg398543
versions: + Python 3.11, - Python 3.6, Python 3.7, Python 3.8
2020-12-21 05:26:15hydroflasksetmessages: + msg383489
2020-12-21 02:15:00hydroflasksetmessages: + msg383482
2020-12-21 01:32:19hydroflasksetmessages: + msg383472
2020-12-21 00:46:20hydroflasksetcomponents: + Extension Modules
2020-12-21 00:44:44hydroflaskcreate