Title: Convert sqlite3 to multi-phase initialisation (PEP 489)
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.10
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, corona10, erlendaasland, vstinner
Priority: normal Keywords:

Created on 2020-10-17 19:37 by erlendaasland, last changed 2020-11-03 20:19 by erlendaasland.

Messages (9)
msg378825 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2020-10-17 19:37
Porting sqlite3 to multi-phase initialisation is non-trivial and must be broken into smaller bpo's, often with multiple PR's per bpo. As far as I can see, this is the task list:

1) Heap types (PEP 384): bpo-41861
2) Argument Clinic: bpo-40956
3) Establish global module state
4) Pass `self` iso. callback object to trace, auth, and progress handlers (required for item 7)
5) Use sqlite3_create_collation_v2 iso. sqlite3_create_collation (see next list item)
6) For sqlite3_create_* functions, establish a context struct that contains the target user callback and the state. Modify callers to allocate memory, and the destructor to deallocate
7) Module state (PEP 573)
8) Multi-phase initialisation (PEP 489)

The list may be incomplete.

Separate bpo's for 1) and 2) are opened. Pr. 2020-10-17, 1) is almost done, and 2), part 1 of 5 is awaiting review. It may be convenient to open up sub-bpo's for some of the other tasks as well. For instance, using sqlite3_create_collation_v2() iso. sqlite3_create_collation(). (FYI, this imposes no new SQLite version requirements.)

I'd wait until AC is implemented with moving forward with the rest of the PR's, in order to avoid merge mess. The exception might be item 4, preparing the trace/progress/auth callbacks, which is easy to isolate from the rest.

I've prepared branches for most of these things locally, some split in multiple PR's to ease reviewing.

I've tagged Victor Stinner and Dong-hee Na (hope that's ok), since they've been helpful with reviewing and providing helpful hints thus far. Any input from Berker Peksag would be highly appreciated; after all, he is the maintainer of this module :)

Ref. bpo-1635741 (the grand multi-phase issue)
msg378829 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2020-10-17 20:30
Instead of using a custom context struct for sqlite3_create_funcion_v2(), we may just use sqlite3_context_db_handle() to get `self`, and then fetch the state using PyType_GetModuleState(Py_TYPE(self)).
msg378832 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2020-10-17 20:46
No, unless we can add `self` as a private context to a sqlite3 database handle, we can't use that shortcut. However, using a custom context struct works, and it's a minor code change.

Proof-of-concept implementation diffstat:
 2 files changed, 54 insertions(+), 10 deletions(-)
msg378905 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2020-10-18 21:51
FYI, WIP branch for this bpo is here:
This currently includes items 1 though 6. I'll update the WIP branch with a draft of items 7 and 8, hopefully tomorrow.
msg379029 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2020-10-19 21:50
Work with item 7 is progressing; see WIP branch for details. Current showstopper is how to fetch module state from Connection.__init__ when it has been overloaded (for example in /Lib/sqlite3/test/ Guessing that Py_TYPE(self) does not return the sqlite3.Connection type.
msg379172 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2020-10-20 22:59
Connection.__init__() fixed by iterating though tp_bases in order to find the correct base class to load the state from. There should be a better way to do this than to dig through type member internals.
msg379326 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2020-10-22 19:35
Sorted out item 7 (module state). Item 8 is mostly in place, although I get ref. count errors if I add traverse/clear callbacks, so I've temporarily disabled those.

Proof-of-concept of multi-phase init can be found here:
I'll try to break items 3 and 7 up in smaller pieces for the reviewer's convenience.
msg379476 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2020-10-23 21:13
I've narrowed down the traverse problems to Py_VISIT(state->CacheType). Suspecting this has to do with the cyclic references in cache/connection. Preliminary tests with GC tracking `cache->factory` look promising, but this is a part of the C API that's _completely_ unknown to me (yet), so I'm on thin ice here.
msg380299 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2020-11-03 20:19
FYI, rebased onto master, added Petr Viktorin's _PyType_GetModuleByDef() for use in item 7 (module state). I still run into problems in item 8, but I haven't devoted much time for this yet:
Modules/gcmodule.c:116: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
Enable tracemalloc to get the memory block allocation traceback

object address  : 0x7f861140c6b0
object refcount : 2
object type     : 0x109666678
object type name: type
object repr     : <class 'sqlite3.InterfaceError'>

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: finalizing (tstate=0x7f8631409000)
Date User Action Args
2020-11-03 20:19:57erlendaaslandsetmessages: + msg380299
2020-10-23 21:13:18erlendaaslandsetmessages: + msg379476
2020-10-22 19:35:03erlendaaslandsetmessages: + msg379326
2020-10-20 22:59:25erlendaaslandsetmessages: + msg379172
2020-10-19 21:50:53erlendaaslandsetmessages: + msg379029
2020-10-18 21:51:01erlendaaslandsetmessages: + msg378905
2020-10-17 20:46:09erlendaaslandsetmessages: + msg378832
2020-10-17 20:30:00erlendaaslandsetmessages: + msg378829
2020-10-17 19:37:05erlendaaslandcreate