Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert sqlite3 to multi-phase initialisation (PEP 489) #86230

Closed
erlend-aasland opened this issue Oct 17, 2020 · 34 comments
Closed

Convert sqlite3 to multi-phase initialisation (PEP 489) #86230

erlend-aasland opened this issue Oct 17, 2020 · 34 comments
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@erlend-aasland
Copy link
Contributor

BPO 42064
Nosy @encukou, @berkerpeksag, @corona10, @miss-islington, @erlend-aasland, @DiddiLeija
PRs
  • bpo-42064: Move sqlite3 types to global state #26537
  • bpo-42064: Move sqlite3 exceptions to global state, part 1 of 2 #26745
  • bpo-42064: Remove stale extern declarations in sqlite3 headers #26840
  • bpo-42064: Move sqlite3 exceptions to global state, part 2 of 2 #26884
  • bpo-42064: Finalise establishing sqlite3 global state #27155
  • bpo-42064: Migrate to sqlite3_create_collation_v2 #27156
  • bpo-42064: Optimise sqlite3 state access, part 1 #27273
  • bpo-42064: Pass module state to sqlite3 UDF callbacks #27456
  • bpo-42064: Offset arguments for PyObject_Vectorcall in the _sqlite module #27931
  • bpo-42064: Pass module state to trace, progress, and authorizer callbacks #27940
  • bpo-42064: Add module backref to sqlite3 callback context #28242
  • bpo-42064: Convert sqlite3 global state to module state #29073
  • bpo-42064: Adapt sqlite3 to multi-phase init (PEP 489) #29234
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-11-02.18:10:54.692>
    created_at = <Date 2020-10-17.19:37:05.423>
    labels = ['type-feature', 'library', '3.10']
    title = 'Convert sqlite3 to multi-phase initialisation (PEP 489)'
    updated_at = <Date 2021-11-03.11:51:24.093>
    user = 'https://github.com/erlend-aasland'

    bugs.python.org fields:

    activity = <Date 2021-11-03.11:51:24.093>
    actor = 'erlendaasland'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-11-02.18:10:54.692>
    closer = 'erlendaasland'
    components = ['Library (Lib)']
    creation = <Date 2020-10-17.19:37:05.423>
    creator = 'erlendaasland'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42064
    keywords = ['patch']
    message_count = 34.0
    messages = ['378825', '378829', '378832', '378905', '379029', '379172', '379326', '379476', '380299', '395047', '395876', '396411', '396416', '397477', '397866', '398298', '398470', '400205', '400206', '400208', '400712', '400720', '400725', '400726', '400733', '400774', '401245', '401259', '401423', '404301', '405086', '405512', '405583', '405609']
    nosy_count = 6.0
    nosy_names = ['petr.viktorin', 'berker.peksag', 'corona10', 'miss-islington', 'erlendaasland', 'DiddiLeija']
    pr_nums = ['26537', '26745', '26840', '26884', '27155', '27156', '27273', '27456', '27931', '27940', '28242', '29073', '29234']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue42064'
    versions = ['Python 3.10']

    @erlend-aasland
    Copy link
    Contributor Author

    Porting sqlite3 to multi-phase initialisation is non-trivial and must be broken into smaller bpo's, often with multiple PR's per bpo. As far as I can see, this is the task list:

    1. Heap types (PEP-384): bpo-41861
    2. Argument Clinic: bpo-40956
    3. Establish global module state
    4. Pass self iso. callback object to trace, auth, and progress handlers (required for item 7)
    5. Use sqlite3_create_collation_v2 iso. sqlite3_create_collation (see next list item)
    6. For sqlite3_create_* functions, establish a context struct that contains the target user callback and the state. Modify callers to allocate memory, and the destructor to deallocate
    7. Module state (PEP-573)
    8. Multi-phase initialisation (PEP-489)

    The list may be incomplete.

    Separate bpo's for 1) and 2) are opened. Pr. 2020-10-17, 1) is almost done, and 2), part 1 of 5 is awaiting review. It may be convenient to open up sub-bpo's for some of the other tasks as well. For instance, using sqlite3_create_collation_v2() iso. sqlite3_create_collation(). (FYI, this imposes no new SQLite version requirements.)

    I'd wait until AC is implemented with moving forward with the rest of the PR's, in order to avoid merge mess. The exception might be item 4, preparing the trace/progress/auth callbacks, which is easy to isolate from the rest.

    I've prepared branches for most of these things locally, some split in multiple PR's to ease reviewing.

    I've tagged Victor Stinner and Dong-hee Na (hope that's ok), since they've been helpful with reviewing and providing helpful hints thus far. Any input from Berker Peksag would be highly appreciated; after all, he is the maintainer of this module :)

    Ref. bpo-1635741 (the grand multi-phase issue)

    @erlend-aasland erlend-aasland added 3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Oct 17, 2020
    @erlend-aasland
    Copy link
    Contributor Author

    Instead of using a custom context struct for sqlite3_create_funcion_v2(), we may just use sqlite3_context_db_handle() to get self, and then fetch the state using PyType_GetModuleState(Py_TYPE(self)).

    @erlend-aasland
    Copy link
    Contributor Author

    No, unless we can add self as a private context to a sqlite3 database handle, we can't use that shortcut. However, using a custom context struct works, and it's a minor code change.

    Proof-of-concept implementation diffstat:
    2 files changed, 54 insertions(+), 10 deletions(-)

    @erlend-aasland
    Copy link
    Contributor Author

    FYI, WIP branch for this bpo is here: https://github.com/erlend-aasland/cpython/commits/bpo-42064/all
    This currently includes items 1 though 6. I'll update the WIP branch with a draft of items 7 and 8, hopefully tomorrow.

    @erlend-aasland
    Copy link
    Contributor Author

    Work with item 7 is progressing; see WIP branch for details. Current showstopper is how to fetch module state from Connection.__init__ when it has been overloaded (for example in /Lib/sqlite3/test/factory.py). Guessing that Py_TYPE(self) does not return the sqlite3.Connection type.

    @erlend-aasland
    Copy link
    Contributor Author

    Connection.__init__() fixed by iterating though tp_bases in order to find the correct base class to load the state from. There should be a better way to do this than to dig through type member internals.

    @erlend-aasland
    Copy link
    Contributor Author

    Sorted out item 7 (module state). Item 8 is mostly in place, although I get ref. count errors if I add traverse/clear callbacks, so I've temporarily disabled those.

    Proof-of-concept of multi-phase init can be found here: https://github.com/erlend-aasland/cpython/commits/bpo-42064/all
    I'll try to break items 3 and 7 up in smaller pieces for the reviewer's convenience.

    @erlend-aasland
    Copy link
    Contributor Author

    I've narrowed down the traverse problems to Py_VISIT(state->CacheType). Suspecting this has to do with the cyclic references in cache/connection. Preliminary tests with GC tracking cache->factory look promising, but this is a part of the C API that's completely unknown to me (yet), so I'm on thin ice here.

    @erlend-aasland
    Copy link
    Contributor Author

    FYI, rebased https://github.com/erlend-aasland/cpython/tree/bpo-42064/all onto master, added Petr Viktorin's _PyType_GetModuleByDef() for use in item 7 (module state). I still run into problems in item 8, but I haven't devoted much time for this yet:
    Modules/gcmodule.c:116: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
    Enable tracemalloc to get the memory block allocation traceback

    object address : 0x7f861140c6b0
    object refcount : 2
    object type : 0x109666678
    object type name: type
    object repr : <class 'sqlite3.InterfaceError'>

    Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
    Python runtime state: finalizing (tstate=0x7f8631409000)

    @erlend-aasland
    Copy link
    Contributor Author

    Global module state has been established by f461a7f (bpo-42862, #68391). We can safely migrate static variables into that struct as a next step.

    @encukou
    Copy link
    Member

    encukou commented Jun 15, 2021

    New changeset 10a5c80 by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Move sqlite3 types to global state (GH-26537)
    10a5c80

    @corona10
    Copy link
    Member

    New changeset 019ad62 by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Remove stale extern declarations in sqlite3 headers (GH-26840)
    019ad62

    @miss-islington
    Copy link
    Contributor

    New changeset a50e283 by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Move sqlite3 exceptions to global state, part 1 of 2 (GH-26745)
    a50e283

    @miss-islington
    Copy link
    Contributor

    New changeset 0516299 by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Move sqlite3 exceptions to global state, part 2 of 2 (GH-26884)
    0516299

    @encukou
    Copy link
    Member

    encukou commented Jul 20, 2021

    New changeset 4c0deb2 by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Finalise establishing sqlite3 global state (GH-27155)
    4c0deb2

    @encukou
    Copy link
    Member

    encukou commented Jul 27, 2021

    New changeset 890e229 by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Migrate to sqlite3_create_collation_v2 (GH-27156)
    890e229

    @miss-islington
    Copy link
    Contributor

    New changeset d542742 by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Optimise sqlite3 state access, part 1 (GH-27273)
    d542742

    @encukou
    Copy link
    Member

    encukou commented Aug 24, 2021

    I think the module could use a more comprehensive review for GIL handling, rather than doing it piecewise in individual PRs. I recommend that any function passed to SQLite (and only those) should

    • be named *_callback, for clarity
    • acquire the GIL at the very start
    • release the GIL at the very end

    @miss-islington
    Copy link
    Contributor

    New changeset 9ed5231 by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Pass module state to sqlite3 UDF callbacks (GH-27456)
    9ed5231

    @erlend-aasland
    Copy link
    Contributor Author

    Petr:

    I think the module could use a more comprehensive review for GIL handling [...]

    I agree. I created bpo-44991 for this.

    @encukou
    Copy link
    Member

    encukou commented Aug 31, 2021

    Here's a gotcha you might not be aware of:
    create_callback_context stashes away a pointer to state. I don't think we can prove that the state will always outlive the callback_context after it'll become possible to deallocate the module

    The state doesn't have a refcount, but it is owned by a module object, so callback_context should own a reference to the module object.

    @erlend-aasland
    Copy link
    Contributor Author

    Thanks, good catch! I'll add that after PR 27934 is merged.

    @encukou
    Copy link
    Member

    encukou commented Aug 31, 2021

    New changeset 01dea5f by Petr Viktorin in branch 'main':
    bpo-42064: Offset arguments for PyObject_Vectorcall in the _sqlite module (GH-27931)
    01dea5f

    @erlend-aasland
    Copy link
    Contributor Author

    The state doesn't have a refcount, but it is owned by a module object, so callback_context should own a reference to the module object.

    Would it be sufficient to hold a reference to the connection object? The connection holds a reference to the connection type, and the connection type holds a reference to the module, right?

    @encukou
    Copy link
    Member

    encukou commented Aug 31, 2021

    Would it be sufficient to hold a reference to the connection object?

    Yes.

    @erlend-aasland
    Copy link
    Contributor Author

    > Would it be sufficient to hold a reference to the connection object?

    Yes

    Good, that simplifies things. I'll wait with this until we've resolved PR 27940 though.

    @encukou
    Copy link
    Member

    encukou commented Sep 7, 2021

    New changeset 0474d06 by Erlend Egeberg Aasland in branch 'main':
    bpo-44991: Normalise sqlite3 callback naming (GH-28088)
    0474d06

    @encukou
    Copy link
    Member

    encukou commented Sep 7, 2021

    New changeset 979336d by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Pass module state to trace, progress, and authorizer callbacks (GH-27940)
    979336d

    @erlend-aasland
    Copy link
    Contributor Author

    I did some experiments using the connection object as a "backref" in the callback context, but it seems that the GC does not play well with such ref circles; I ended up with a lot of ref leaks (yes, I modified the traverse and clear slots to visit and clear the backref).

    Using the module object as a "backref", however, worked swell. It ends up being a much larger PR, though.

    @encukou
    Copy link
    Member

    encukou commented Oct 19, 2021

    New changeset 09c04e7 by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Add module backref to sqlite3 callback context (GH-28242)
    09c04e7

    @encukou
    Copy link
    Member

    encukou commented Oct 27, 2021

    New changeset 8f24b7d by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Convert sqlite3 global state to module state (GH-29073)
    8f24b7d

    @encukou
    Copy link
    Member

    encukou commented Nov 2, 2021

    New changeset 401272e by Erlend Egeberg Aasland in branch 'main':
    bpo-42064: Adapt sqlite3 to multi-phase init (PEP-489) (GH-29234)
    401272e

    @encukou
    Copy link
    Member

    encukou commented Nov 3, 2021

    Hooray!
    Congratulations, and thanks for your work and determination to get this done :)

    @erlend-aasland
    Copy link
    Contributor Author

    Congratulations, and thanks for your work and determination to get this done :)

    Thanks, and thank you for helping out, Petr :) Also a big thanks to Dong-hee, Berker, Serhiy, Pablo, and Victor for reviews and guidance with this "project".

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants