classification
Title: Convert sqlite3 to multi-phase initialisation (PEP 489)
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: DiddiLeija, berker.peksag, corona10, erlendaasland, miss-islington, petr.viktorin
Priority: normal Keywords: patch

Created on 2020-10-17 19:37 by erlendaasland, last changed 2021-11-03 11:51 by erlendaasland. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 26537 merged erlendaasland, 2021-06-04 21:58
PR 26745 merged erlendaasland, 2021-06-15 19:37
PR 26840 merged erlendaasland, 2021-06-21 22:34
PR 26884 merged erlendaasland, 2021-06-23 20:11
PR 27155 merged erlendaasland, 2021-07-14 23:06
PR 27156 merged erlendaasland, 2021-07-15 00:02
PR 27273 merged erlendaasland, 2021-07-20 22:03
PR 27456 merged erlendaasland, 2021-07-29 19:03
PR 27931 merged petr.viktorin, 2021-08-24 14:04
PR 27940 merged erlendaasland, 2021-08-25 08:59
PR 28242 merged erlendaasland, 2021-09-08 20:59
PR 29073 merged erlendaasland, 2021-10-19 20:32
PR 29234 merged erlendaasland, 2021-10-27 11:53
Messages (34)
msg378825 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2020-10-17 19:37
Porting sqlite3 to multi-phase initialisation is non-trivial and must be broken into smaller bpo's, often with multiple PR's per bpo. As far as I can see, this is the task list:

1) Heap types (PEP 384): bpo-41861
2) Argument Clinic: bpo-40956
3) Establish global module state
4) Pass `self` iso. callback object to trace, auth, and progress handlers (required for item 7)
5) Use sqlite3_create_collation_v2 iso. sqlite3_create_collation (see next list item)
6) For sqlite3_create_* functions, establish a context struct that contains the target user callback and the state. Modify callers to allocate memory, and the destructor to deallocate
7) Module state (PEP 573)
8) Multi-phase initialisation (PEP 489)

The list may be incomplete.

Separate bpo's for 1) and 2) are opened. Pr. 2020-10-17, 1) is almost done, and 2), part 1 of 5 is awaiting review. It may be convenient to open up sub-bpo's for some of the other tasks as well. For instance, using sqlite3_create_collation_v2() iso. sqlite3_create_collation(). (FYI, this imposes no new SQLite version requirements.)

I'd wait until AC is implemented with moving forward with the rest of the PR's, in order to avoid merge mess. The exception might be item 4, preparing the trace/progress/auth callbacks, which is easy to isolate from the rest.

I've prepared branches for most of these things locally, some split in multiple PR's to ease reviewing.

I've tagged Victor Stinner and Dong-hee Na (hope that's ok), since they've been helpful with reviewing and providing helpful hints thus far. Any input from Berker Peksag would be highly appreciated; after all, he is the maintainer of this module :)

Ref. bpo-1635741 (the grand multi-phase issue)
msg378829 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2020-10-17 20:30
Instead of using a custom context struct for sqlite3_create_funcion_v2(), we may just use sqlite3_context_db_handle() to get `self`, and then fetch the state using PyType_GetModuleState(Py_TYPE(self)).
msg378832 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2020-10-17 20:46
No, unless we can add `self` as a private context to a sqlite3 database handle, we can't use that shortcut. However, using a custom context struct works, and it's a minor code change.

Proof-of-concept implementation diffstat:
 2 files changed, 54 insertions(+), 10 deletions(-)
msg378905 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2020-10-18 21:51
FYI, WIP branch for this bpo is here: https://github.com/erlend-aasland/cpython/commits/bpo-42064/all
This currently includes items 1 though 6. I'll update the WIP branch with a draft of items 7 and 8, hopefully tomorrow.
msg379029 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2020-10-19 21:50
Work with item 7 is progressing; see WIP branch for details. Current showstopper is how to fetch module state from Connection.__init__ when it has been overloaded (for example in /Lib/sqlite3/test/factory.py). Guessing that Py_TYPE(self) does not return the sqlite3.Connection type.
msg379172 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2020-10-20 22:59
Connection.__init__() fixed by iterating though tp_bases in order to find the correct base class to load the state from. There should be a better way to do this than to dig through type member internals.
msg379326 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2020-10-22 19:35
Sorted out item 7 (module state). Item 8 is mostly in place, although I get ref. count errors if I add traverse/clear callbacks, so I've temporarily disabled those.

Proof-of-concept of multi-phase init can be found here: https://github.com/erlend-aasland/cpython/commits/bpo-42064/all
I'll try to break items 3 and 7 up in smaller pieces for the reviewer's convenience.
msg379476 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2020-10-23 21:13
I've narrowed down the traverse problems to Py_VISIT(state->CacheType). Suspecting this has to do with the cyclic references in cache/connection. Preliminary tests with GC tracking `cache->factory` look promising, but this is a part of the C API that's _completely_ unknown to me (yet), so I'm on thin ice here.
msg380299 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2020-11-03 20:19
FYI, rebased https://github.com/erlend-aasland/cpython/tree/bpo-42064/all onto master, added Petr Viktorin's _PyType_GetModuleByDef() for use in item 7 (module state). I still run into problems in item 8, but I haven't devoted much time for this yet:
Modules/gcmodule.c:116: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
Enable tracemalloc to get the memory block allocation traceback

object address  : 0x7f861140c6b0
object refcount : 2
object type     : 0x109666678
object type name: type
object repr     : <class 'sqlite3.InterfaceError'>

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: finalizing (tstate=0x7f8631409000)
msg395047 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-06-03 20:29
Global module state has been established by f461a7fc3f8740b9e79e8874175115a3474e5930 (bpo-42862, GH-24203). We can safely migrate static variables into that struct as a next step.
msg395876 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-06-15 12:47
New changeset 10a5c806d4dec6c342dcc9888fbe4fa1fa9b7a1f by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Move sqlite3 types to global state (GH-26537)
https://github.com/python/cpython/commit/10a5c806d4dec6c342dcc9888fbe4fa1fa9b7a1f
msg396411 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-06-23 12:07
New changeset 019ad62afd20e80c74f879aa716e339b992a0bb9 by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Remove stale extern declarations in `sqlite3` headers (GH-26840)
https://github.com/python/cpython/commit/019ad62afd20e80c74f879aa716e339b992a0bb9
msg396416 - (view) Author: miss-islington (miss-islington) Date: 2021-06-23 12:57
New changeset a50e28377bcf37121b55c2de70d95a5386c478f8 by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Move `sqlite3` exceptions to global state, part 1 of 2 (GH-26745)
https://github.com/python/cpython/commit/a50e28377bcf37121b55c2de70d95a5386c478f8
msg397477 - (view) Author: miss-islington (miss-islington) Date: 2021-07-14 11:26
New changeset 05162993fe62e7fa3acebdd0062586b9bf63d46a by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Move `sqlite3` exceptions to global state, part 2 of 2 (GH-26884)
https://github.com/python/cpython/commit/05162993fe62e7fa3acebdd0062586b9bf63d46a
msg397866 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-07-20 10:59
New changeset 4c0deb25ac899fbe4da626ce3cb21f204cdd3aa9 by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Finalise establishing sqlite3 global state (GH-27155)
https://github.com/python/cpython/commit/4c0deb25ac899fbe4da626ce3cb21f204cdd3aa9
msg398298 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-07-27 13:54
New changeset 890e22957d427ee994b85d62dfe4d5a7cbd34ec5 by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Migrate to `sqlite3_create_collation_v2` (GH-27156)
https://github.com/python/cpython/commit/890e22957d427ee994b85d62dfe4d5a7cbd34ec5
msg398470 - (view) Author: miss-islington (miss-islington) Date: 2021-07-29 09:21
New changeset d542742128b634264d5b6796297613975211b43b by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Optimise `sqlite3` state access, part 1 (GH-27273)
https://github.com/python/cpython/commit/d542742128b634264d5b6796297613975211b43b
msg400205 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-08-24 12:22
I think the module could use a more comprehensive review for GIL handling, rather than doing it piecewise in individual PRs. I recommend that any function passed to SQLite (and only those) should
  - be named `*_callback`, for clarity
  - acquire the GIL at the very start
  - release the GIL at the very end
msg400206 - (view) Author: miss-islington (miss-islington) Date: 2021-08-24 12:24
New changeset 9ed523159c7ba840dbf403e02498eeae1b5d3ed9 by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Pass module state to `sqlite3` UDF callbacks (GH-27456)
https://github.com/python/cpython/commit/9ed523159c7ba840dbf403e02498eeae1b5d3ed9
msg400208 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-24 13:07
Petr:
> I think the module could use a more comprehensive review for GIL handling [...]

I agree. I created bpo-44991 for this.
msg400712 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-08-31 11:40
Here's a gotcha you might not be aware of:
`create_callback_context` stashes away a pointer to `state`. I don't think we can prove that the `state` will always outlive the callback_context after it'll become possible to deallocate the module

The state doesn't have a refcount, but it is owned by a module object, so callback_context should own a reference to the module object.
msg400720 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-31 12:16
Thanks, good catch! I'll add that after PR 27934 is merged.
msg400725 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-08-31 12:34
New changeset 01dea5f12b31862999217c091399a318f23b460a by Petr Viktorin in branch 'main':
bpo-42064: Offset arguments for PyObject_Vectorcall in the _sqlite module (GH-27931)
https://github.com/python/cpython/commit/01dea5f12b31862999217c091399a318f23b460a
msg400726 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-31 12:49
> The state doesn't have a refcount, but it is owned by a module object, so callback_context should own a reference to the module object.

Would it be sufficient to hold a reference to the connection object? The connection holds a reference to the connection type, and the connection type holds a reference to the module, right?
msg400733 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-08-31 13:47
> Would it be sufficient to hold a reference to the connection object?

Yes.
msg400774 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-31 19:56
>> Would it be sufficient to hold a reference to the connection object?
>
> Yes

Good, that simplifies things. I'll wait with this until we've resolved PR 27940 though.
msg401245 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-09-07 11:43
New changeset 0474d06008f8c9eba660ed20d336ffdc5c4db121 by Erlend Egeberg Aasland in branch 'main':
bpo-44991: Normalise `sqlite3` callback naming (GH-28088)
https://github.com/python/cpython/commit/0474d06008f8c9eba660ed20d336ffdc5c4db121
msg401259 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-09-07 13:06
New changeset 979336de34e3d3f40cf6e666b72a618f6330f3c1 by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Pass module state to trace, progress, and authorizer callbacks (GH-27940)
https://github.com/python/cpython/commit/979336de34e3d3f40cf6e666b72a618f6330f3c1
msg401423 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-09-08 20:49
I did some experiments using the connection object as a "backref" in the callback context, but it seems that the GC does not play well with such ref circles; I ended up with a lot of ref leaks (yes, I modified the traverse and clear slots to visit and clear the backref).

Using the module object as a "backref", however, worked swell. It ends up being a much larger PR, though.
msg404301 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-10-19 13:44
New changeset 09c04e7f0d26f0006964554b6a0caa5ef7f0bd24 by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Add module backref to `sqlite3` callback context (GH-28242)
https://github.com/python/cpython/commit/09c04e7f0d26f0006964554b6a0caa5ef7f0bd24
msg405086 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-10-27 11:12
New changeset 8f24b7dbcbd83311dad510863d8cb41f0e91b464 by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Convert `sqlite3` global state to module state (GH-29073)
https://github.com/python/cpython/commit/8f24b7dbcbd83311dad510863d8cb41f0e91b464
msg405512 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-11-02 15:36
New changeset 401272e6e660445d6556d5cd4db88ed4267a50b3 by Erlend Egeberg Aasland in branch 'main':
bpo-42064: Adapt `sqlite3` to multi-phase init (PEP 489) (GH-29234)
https://github.com/python/cpython/commit/401272e6e660445d6556d5cd4db88ed4267a50b3
msg405583 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-11-03 06:22
Hooray!
Congratulations, and thanks for your work and determination to get this done :)
msg405609 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-11-03 11:51
> Congratulations, and thanks for your work and determination to get this done :)

Thanks, and thank you for helping out, Petr :) Also a big thanks to Dong-hee, Berker, Serhiy, Pablo, and Victor for reviews and guidance with this "project".
History
Date User Action Args
2021-11-03 11:51:24erlendaaslandsetmessages: + msg405609
2021-11-03 06:22:43petr.viktorinsetmessages: + msg405583
2021-11-02 18:10:54erlendaaslandsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-11-02 15:36:07petr.viktorinsetmessages: + msg405512
2021-10-27 11:53:27erlendaaslandsetpull_requests: + pull_request27497
2021-10-27 11:12:25petr.viktorinsetmessages: + msg405086
2021-10-19 20:32:24erlendaaslandsetpull_requests: + pull_request27341
2021-10-19 13:44:54petr.viktorinsetmessages: + msg404301
2021-09-09 00:11:09DiddiLeijasetnosy: + DiddiLeija
2021-09-08 20:59:19erlendaaslandsetpull_requests: + pull_request26661
2021-09-08 20:49:37erlendaaslandsetmessages: + msg401423
2021-09-07 13:06:25petr.viktorinsetmessages: + msg401259
2021-09-07 11:43:47petr.viktorinsetmessages: + msg401245
2021-08-31 21:21:22vstinnersetnosy: - vstinner
2021-08-31 19:56:01erlendaaslandsetmessages: + msg400774
2021-08-31 13:47:04petr.viktorinsetmessages: + msg400733
2021-08-31 12:49:35erlendaaslandsetmessages: + msg400726
2021-08-31 12:34:48petr.viktorinsetmessages: + msg400725
2021-08-31 12:16:06erlendaaslandsetmessages: + msg400720
2021-08-31 11:40:23petr.viktorinsetmessages: + msg400712
2021-08-25 08:59:23erlendaaslandsetpull_requests: + pull_request26385
2021-08-24 14:04:09petr.viktorinsetpull_requests: + pull_request26380
2021-08-24 13:07:44erlendaaslandsetmessages: + msg400208
2021-08-24 12:24:16miss-islingtonsetmessages: + msg400206
2021-08-24 12:22:05petr.viktorinsetmessages: + msg400205
2021-07-29 19:03:58erlendaaslandsetpull_requests: + pull_request25982
2021-07-29 09:21:59miss-islingtonsetmessages: + msg398470
2021-07-27 13:54:34petr.viktorinsetmessages: + msg398298
2021-07-20 22:03:33erlendaaslandsetpull_requests: + pull_request25818
2021-07-20 10:59:48petr.viktorinsetmessages: + msg397866
2021-07-15 00:02:28erlendaaslandsetpull_requests: + pull_request25693
2021-07-14 23:06:15erlendaaslandsetpull_requests: + pull_request25692
2021-07-14 11:26:51miss-islingtonsetmessages: + msg397477
2021-06-23 20:11:41erlendaaslandsetpull_requests: + pull_request25462
2021-06-23 12:57:00miss-islingtonsetnosy: + miss-islington
messages: + msg396416
2021-06-23 12:07:01corona10setmessages: + msg396411
2021-06-21 22:34:11erlendaaslandsetpull_requests: + pull_request25421
2021-06-15 19:37:12erlendaaslandsetpull_requests: + pull_request25330
2021-06-15 12:47:41petr.viktorinsetnosy: + petr.viktorin
messages: + msg395876
2021-06-04 21:58:59erlendaaslandsetkeywords: + patch
stage: patch review
pull_requests: + pull_request25128
2021-06-03 20:29:27erlendaaslandsetmessages: + msg395047
2020-11-03 20:19:57erlendaaslandsetmessages: + msg380299
2020-10-23 21:13:18erlendaaslandsetmessages: + msg379476
2020-10-22 19:35:03erlendaaslandsetmessages: + msg379326
2020-10-20 22:59:25erlendaaslandsetmessages: + msg379172
2020-10-19 21:50:53erlendaaslandsetmessages: + msg379029
2020-10-18 21:51:01erlendaaslandsetmessages: + msg378905
2020-10-17 20:46:09erlendaaslandsetmessages: + msg378832
2020-10-17 20:30:00erlendaaslandsetmessages: + msg378829
2020-10-17 19:37:05erlendaaslandcreate