classification
Title: Use functools.lru_cache iso. _sqlite.Cache in sqlite3 module
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, corona10, erlendaasland, pablogsal, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2021-01-08 00:39 by erlendaasland, last changed 2021-06-23 14:04 by corona10. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24135 erlendaasland, 2021-01-11 19:45
PR 24203 merged erlendaasland, 2021-01-12 23:24
PR 26876 merged erlendaasland, 2021-06-23 12:16
Messages (18)
msg384625 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-08 00:39
Pro: less code, less complexity, improved maintainability
Con: minor performance hit

PoC here: https://github.com/erlend-aasland/cpython/commits/sqlite-cache

$ ./python.exe
>>> import sqlite3
>>> con = sqlite3.connect(":memory:")
>>> con.execute("select * from sqlite_master")
>>> con.execute("select * from sqlite_master")
>>> c = con.cache()
>>> c.cache_info()
CacheInfo(hits=1, misses=1, maxsize=128, currsize=1)

The test suite runs approx. 10-20 ms slower with this change. Using _functools._lru_cache_wrapper iso. functools.lru_cache almost removes this performance regression.


Berker, is it worth pursuing?
msg384627 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-08 00:51
Short diffstat: 8 files changed, 85 insertions(+), 406 deletions(-)
msg384691 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-08 21:25
$ python3.10 -m timeit -s 'import sqlite3; con = sqlite3.connect(":memory:"); query="select * from sqlite_master"' 'con.execute(query); con.execute(query)'
100000 loops, best of 5: 2.95 usec per loop
$ ./python.exe -m timeit -s 'import sqlite3; con = sqlite3.connect(":memory:"); query="select * from sqlite_master"' 'con.execute(query); con.execute(query)'
100000 loops, best of 5: 2.68 usec per loop
msg384892 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-01-12 05:45
+1 This seems reasonable to me.
msg384923 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-12 11:59
I can throw up the PoC branch as a draft PR after GH-24135 is merged. We can just close the PR if this is uninteresting or something we want to postpone.
msg384926 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2021-01-12 12:43
I don't see any reason to merge GH-24135 if we are going to remove cache.[ch] in this issue.

I was -0 before but since Raymond gave his +1, you can count me as +1 too.
msg384927 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-12 12:51
> I don't see any reason to merge GH-24135 if we are going to remove cache.[ch] in this issue.

Yes, I've thought about that myself. A small argument pro merging GH-24135 would be that if we for some reason decide to revert this change, then cache.[ch] would still be prepared for module state (which is needed for multi-phase init).
msg384928 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2021-01-12 12:54
We can always reopen GH-24135 and merge it even if we revert this one for some reason :)
msg384929 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-12 12:56
True that :) I'll close GH-24135 for now and open a PR for this later today.
msg384935 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-12 13:32
I do not like using _functools._lru_cache_wrapper. It is a deep implementation detail, private function of private module. Use functools.lru_cache. If it is few nanoseconds slower, that cost is only added at connection creation time. It is insignificant in any real application in comparison with IO and any real work with data. Thousands of short-living in-memory DB instances are only created in tests.
msg384976 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-12 20:52
> I do not like using _functools._lru_cache_wrapper. It is a deep implementation detail, private function of private module. Use functools.lru_cache.

All right, thanks.
msg385193 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-18 12:16
I need some help debugging the Windows issues. There are a handful of tests that fail because the sqlite3 is clinging on to objects or file descriptors. AFAICT, a GC bug. For example, test_open_uri fails with:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: [...]

I've tried the following, with no success:
- Remove GC from the Connection type
- Remove the clear method, and modify traverse to only visit Py_TYPE(self) (https://bugs.python.org/issue42866#msg384675)
- Add heap type GC (visit type, GC tracking) to all pysqlite types

Maybe it's best to reopen GH-24135 and continue the work with multi-phase init? That is finalise bpo-40956, add heap type GC (https://bugs.python.org/issue42866#msg384675), establish module state, implement multi-phase init, and then revisit this issue.
msg385376 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-20 22:42
Tried applying bpo-42972 to sqlite and functools, but the error persists.
msg385444 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-21 20:13
This works:

1) fully implement GC in connection (bpo-42972)
2) also visit statement_cache
3) explicitly close connections _and_ call GC in problematic tests

The first point might not be needed for this particular fix.
The last point is a workaround, not a solution. Or is it ok to call gc.collect() in the test suite?
msg385445 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-01-21 20:14
> Or is it ok to call gc.collect() in the test suite?

Seems like it's ok:

$ grep -r gc.collect Lib/test | wc -l
    366
msg393932 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2021-05-19 08:04
> Or is it ok to call gc.collect() in the test suite?

It is but in this case I'd say it's a bit weird that we need to call it. Unfortunately, I don't have much time to investigate it at the moment.
msg395039 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-03 19:59
New changeset f461a7fc3f8740b9e79e8874175115a3474e5930 by Erlend Egeberg Aasland in branch 'main':
bpo-42862: Use functools.lru_cache iso. _sqlite.Cache in sqlite3 module (GH-24203)
https://github.com/python/cpython/commit/f461a7fc3f8740b9e79e8874175115a3474e5930
msg396422 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-06-23 14:04
New changeset 34356a0a4bad0be124ae892cda6c30a38f5f1061 by Erlend Egeberg Aasland in branch 'main':
bpo-42862: Strip stale sqlite3 cache ignores from c-analyzer (GH-26876)
https://github.com/python/cpython/commit/34356a0a4bad0be124ae892cda6c30a38f5f1061
History
Date User Action Args
2021-06-23 14:04:34corona10setnosy: + corona10
messages: + msg396422
2021-06-23 12:16:07erlendaaslandsetpull_requests: + pull_request25452
2021-06-03 20:01:40pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-06-03 19:59:36pablogsalsetnosy: + pablogsal
messages: + msg395039
2021-05-19 08:04:22berker.peksagsetmessages: + msg393932
2021-05-14 21:40:46erlendaaslandsetversions: + Python 3.11, - Python 3.10
2021-01-21 20:14:53erlendaaslandsetmessages: + msg385445
2021-01-21 20:13:10erlendaaslandsetmessages: + msg385444
2021-01-20 22:42:59erlendaaslandsetmessages: + msg385376
2021-01-18 12:16:36erlendaaslandsetmessages: + msg385193
2021-01-12 23:24:57erlendaaslandsetpull_requests: + pull_request23028
2021-01-12 20:52:02erlendaaslandsetmessages: + msg384976
2021-01-12 13:32:42serhiy.storchakasetmessages: + msg384935
2021-01-12 12:56:08erlendaaslandsetmessages: + msg384929
2021-01-12 12:54:24berker.peksagsetmessages: + msg384928
2021-01-12 12:51:55erlendaaslandsetmessages: + msg384927
2021-01-12 12:43:49berker.peksagsetmessages: + msg384926
2021-01-12 11:59:04erlendaaslandsetmessages: + msg384923
2021-01-12 05:45:12rhettingersetnosy: + rhettinger
messages: + msg384892
2021-01-11 19:45:01erlendaaslandsetkeywords: + patch
stage: patch review
pull_requests: + pull_request23021
2021-01-08 21:25:45erlendaaslandsetmessages: + msg384691
2021-01-08 00:51:09erlendaaslandsetmessages: + msg384627
2021-01-08 00:39:03erlendaaslandcreate