classification
Title: Deadlock in pysqlite_connection_dealloc()
Type: behavior Stage:
Components: Extension Modules, Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: erlendaasland, hydroflask
Priority: normal Keywords:

Created on 2020-12-21 00:44 by hydroflask, last changed 2021-08-02 20:29 by hydroflask.

Files
File name Uploaded Description Edit
test.py hydroflask, 2021-07-30 18:03
Messages (10)
msg383471 - (view) Author: (hydroflask) Date: 2020-12-21 00:44
pysqlite_connection_dealloc() calls sqlite3_close{,_v2}(). This can cause a deadlock if sqlite3_close() tries to acquire a lock that another thread  holds, due to a deadlock between the GIL and an internal sqlite3 lock.

This is especially common with sqlite3's "shared cache mode."

Since the GIL should not be released during a tp_dealloc function and python has no control over the behavior of sqlite3_close(), it is incorrect to call sqlite3_close() in pysqlite_connection_dealloc().
msg383472 - (view) Author: (hydroflask) Date: 2020-12-21 01:32
This is also a problem in pysqlite_connection_close() as currently implemented.
msg383482 - (view) Author: (hydroflask) Date: 2020-12-21 02:15
Nevermind it seems that it's legal to call Py_BEGIN_ALLOW_THREADS in tp_dealloc. The fix is then to allow threads around sqlite3_close().
msg383489 - (view) Author: (hydroflask) Date: 2020-12-21 05:26
Another comment: if calling sqlite3_close() outside of GIL, then the associated SQL function destructors must take the GIL before calling Py_DECREF
msg398543 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-07-30 12:23
hydroflask, can you provide a reproducer?
msg398594 - (view) Author: (hydroflask) Date: 2021-07-30 18:03
Unfortunately I am currently unable to write a repro for this bug but I am 100% certain it exists from analyzing the code. In our application (which is a large multithreaded application) after I made the changes consistent with this report the deadlock vanished completely. When the deadlock occurred, the thread stacks were consistent with the information in this report.

To get the deadlock to trigger, SQLite itself must take a lock on sqlite3_close() but I don't know very much about SQLite's internals to consistently provoke that behavior. I've attached my attempt at a repro which is similar enough to the pattern of our application but it was not working.
msg398801 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-02 20:04
Thanks!

I had to comment out the reg_func() call (it is not defined in test.py, and I don't know what it's supposed to do), and change "with lock.shared_context():" to "with lock:", but I'm unable get a deadlock. I've tried with Python 3.8 though 3.11 (dev) and SQLite 3.7.15 through 3.37.0 (dev).

- How long do you normally have to run this in order to force a deadlock?
- Could you attach the output of "python -m sysconfig"?
msg398802 - (view) Author: (hydroflask) Date: 2021-08-02 20:11
You did the right thing in terms of editing the code.

I was not able to get a deadlock with this code which I think should be representative of the error. In production the error was happening after 30 minutes of so.

The major problem is that I don't exactly know how to provoke SQLite to acquire an internal lock. If we assume that it does acquire internal locks and other threads release the GIL before calling into SQLite functions that acquire an internal lock, then you can reason that if sqlite3_close() acquires that lock without first releasing the GIL a deadlock will certainly occur.

This is the deadlock that my application was running into.
msg398803 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-02 20:24
> In production the error was happening after 30 minutes of so.

Great, thanks.

> The major problem is that I don't exactly know how to provoke SQLite to
> acquire an internal lock.

IIRC, you can provoke the internal SQLite lock simply by using transaction control: BEGIN (lock) => COMMIT / ROLLBACK (unlock).

> If we assume that it does acquire internal locks and other threads release
> the GIL before calling into SQLite functions that acquire an internal lock,
> then you can reason that if sqlite3_close() acquires that lock without first
> releasing the GIL a deadlock will certainly occur.

True. For the record, I'm not doubting the existence of this deadlock, nor the correctness of the proposed solution (allow threads around sqlite3_close()) :)

I'll see if I can come up with a compact repro.
msg398804 - (view) Author: (hydroflask) Date: 2021-08-02 20:29
>> The major problem is that I don't exactly know how to provoke SQLite to
>> acquire an internal lock.

> IIRC, you can provoke the internal SQLite lock simply by using transaction control: BEGIN (lock) => COMMIT / ROLLBACK (unlock).

Ah okay, so the sequence would have to be this:

thread1: pysqlite.some_operation()
thread1: release gil
thread1: sqlite3_some_procedure()
thread1: acquire sqlite lock
<switch threads>
thread2: acquire gil
thread2: pysqlite.close()
thread2: sqlite3_close()
thread2: acquire sqlite lock

> I'll see if I can come up with a compact repro.

It should be possible, good luck!
History
Date User Action Args
2021-08-02 20:29:41hydroflasksetmessages: + msg398804
2021-08-02 20:24:12erlendaaslandsetmessages: + msg398803
2021-08-02 20:11:42hydroflasksetmessages: + msg398802
2021-08-02 20:04:50erlendaaslandsetmessages: + msg398801
2021-07-30 18:03:10hydroflasksetfiles: + test.py

messages: + msg398594
2021-07-30 12:23:20erlendaaslandsetnosy: + erlendaasland

messages: + msg398543
versions: + Python 3.11, - Python 3.6, Python 3.7, Python 3.8
2020-12-21 05:26:15hydroflasksetmessages: + msg383489
2020-12-21 02:15:00hydroflasksetmessages: + msg383482
2020-12-21 01:32:19hydroflasksetmessages: + msg383472
2020-12-21 00:46:20hydroflasksetcomponents: + Extension Modules
2020-12-21 00:44:44hydroflaskcreate