classification
Title: [sqlite3] remove superfluous statement weak ref list from connection object
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: erlendaasland Nosy List: berker.peksag, erlendaasland, pablogsal, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2021-05-08 19:43 by erlendaasland, last changed 2021-08-19 07:03 by erlendaasland. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 25998 merged erlendaasland, 2021-05-08 20:08
Messages (9)
msg393282 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-08 19:43
Today, the sqlite3 extension module keeps two[1] lists of statements:

 1. The statement cache[2], stored in pysqlite_Connection.statement_cache
 2. A weak ref. list (hard coded limited to 200 statements in _pysqlite_drop_unused_statement_references()), stored in pysqlite_Connection.statements

AFAICS, there's no benefit in keeping both. Suggesting to remove the latter.

Pro's:
- less lines of code to maintain
- clearer code
- one less data member in pysqlite_Connection
- improved test coverage (there are currently no coverage for _pysqlite_drop_unused_statement_references()[3])

Con's:
- the current code works / "code churn"


$ git diff --shortstat 
 1 file changed, 13 insertions(+), 60 deletions(-)


[1] Actually, there's three lists: SQLite also keeps a list, available using sqlite3_stmt_next()
[2] See bpo-42862 for suggested improvements to the current cache implementation
[3] See bpo-43553 for improving the code coverage of the sqlite3 extension module
msg393283 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-08 19:48
Berker/Serhiy, weak ref. list question:
Could also the pysqlite_Statement.in_weakreflist member go? Seems to be ok:

$ git diff --shortstat
 3 files changed, 13 insertions(+), 72 deletions(-)
$ ./python.exe -m test -R : test_sqlite
0:00:00 load avg: 0.83 Run tests sequentially
0:00:00 load avg: 0.83 [1/1] test_sqlite
beginning 9 repetitions
123456789
.........

== Tests result: SUCCESS ==

1 test OK.

Total duration: 5.7 sec
Tests result: SUCCESS
msg393936 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2021-05-19 08:39
https://github.com/ghaering/pysqlite/commit/33f99be6be5ad7d69767ff172441f1a860416e82 states the following:

    - Use a list of weak references of statements in the Connection class to keep
      track of all statements used with this connection. This is necessary to be
      able to reset all statements used in a connection.

My understanding is that their usages is a bit different: one is for performance reasons (and it can configurable by users) and the other is to keep track of statements inside a connection.

I'm not fully sure that the both use cases can be combined.
msg393942 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-19 09:53
Yes, that seems to be the intention. But, I don't think there is a need to maintain the second list:

1. Resetting statements was historically needed both for commit and rollback; pending statements would block such operations. That's no longer the case for recent SQLite versions (commits fixed in SQLite 3.6.5, rollbacks in SQLite 3.7.11). The sqlite3 module no longer reset commit statements (6ed442c48dd7f8d3097e688a36bc027df3271621), and there should be no need to reset rollbacks either (see bpo-44092).
2. A statement with ref count zero will be garbage collected. Thus, of a statement is dropped from the LRU cache, it will be sqlite3_finalize'd. No statement, no problem. It's the same effect as if _pysqlite_drop_unused_statement_references() used the same limit as the LRU cache.

If we want to explicit reset statements (for some reason), we'd be better off to use sqlite3_stmt_next(). It's faster, and it covers _all_ statements for a given connection.
msg394269 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-24 22:19
> I'm not fully sure that the both use cases can be combined.

I'm fully sure they can :) Did I miss anything in msg393942? AFAICS, there is no need for the weak ref list anymore.
msg394848 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-06-01 10:29
Also, note that if identical SQL statements are created in multiple cursors belonging to the same connection, they will currently _not_ be added to the connection weak ref list. See the if (self->statement->in_use) in the middle of _pysqlite_query_execute(). Thus, the current code actually fails to register _all_ statements in the "all statements" weak ref list. IMO, this is yet another argument to abandon that list.

Side note: This branch is not exercised in the unit test suite. Adding the following test will cover this branch:

diff --git a/Lib/sqlite3/test/dbapi.py b/Lib/sqlite3/test/dbapi.py
index c17f911fa1..8e53c657a6 100644
--- a/Lib/sqlite3/test/dbapi.py
+++ b/Lib/sqlite3/test/dbapi.py
@@ -526,6 +526,10 @@ def test_last_row_id_insert_o_r(self):
         ]
         self.assertEqual(results, expected)
 
+    def test_same_query_in_multiple_cursors(self):
+        cursors = [self.cx.execute("select 1") for _ in range(3)]
+        for cu in cursors:
+            self.assertEqual(cu.fetchall(), [(1,)])
 
 class ThreadTests(unittest.TestCase):
     def setUp(self):


See bpo-43553 for sqlite3 coverage improvements.
msg399559 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-13 20:47
Berker:
> My understanding is that their usages is a bit different:
> one is for performance reasons (and it can configurable by users) and the other is to keep track of statements inside a connection.
>
> I'm not fully sure that the both use cases can be combined.

Only unique statements are added to the weak ref list (and the LRU cache).

Case 1: len(unique statements) <= 128
Both the LRU cache and the weak ref list will be equal sets of statements.

Case 2: len(unique statements) > 128 <= 200
If more than 128 unique statements are created, they "fall off" of the LRU cache and are collected by the GC. The weak ref list will contain all the 128 cached statements, and up to 72 "lost" statements which have been reaped by the GC.
The LRU cache and the _alive_ objects in weak ref list will be equal sets of statements.

Case 3: len(unique statements) > 200
_pysqlite_drop_unused_statement_references() will remove the lost 72 refs and create a new weak ref list that will be equal to the 128 statements in the LRU cache.

Fun fact: After the number of unique statements pass 200, the weak ref list is reset to the remaining 128 "active" statements, and sqlite3.Connection.created_statements is reset to 0. Thus, when we pass the "200 limit" again, the weak ref list will contain 329 elements, not 200 elements.

Any statement that "falls off" of the LRU cache is reaped by the GC which implies that sqlite3_finalize() is called on it, and it's thus removed from the SQLite connection. This implies that the set of statements available through sqlite3_next_stmt() is equal to the set of statements in the LRU cache and the set of _alive_ statements in the weak ref list.

This implies that if we want to finalise all connection statements in sqlite3.Connection.close(), all we need to do is to clear the LRU cache; the GC will invoke stmt_dealloc(), which will finalise the SQLite statement objects. If we want to reset all connection statements (pysqlite_do_all_statements()), we can iterate through them using sqlite3_next_stmt().

However, there is a corner case for when the sqlite3.Connection.statements list can contain statements that are not in the LRU cache: if we've got more than 128 active cursors, all of them with unique statements, we can end up with statements that are _only_ owned by a cursor. However, sqlite3_next_stmt() will still find them if we need to reset them in pysqlite_do_all_statements(). On connection close, all cursors are destroyed anyway, so we will end up finalising all statements, weak ref list or not.

Can we merge PR-25998 now?

Prove me wrong :)
msg399875 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-18 23:38
New changeset 243b6c3b8fd3144450c477d99f01e31e7c3ebc0f by Erlend Egeberg Aasland in branch 'main':
bpo-44079: Strip superfluous statement cache from sqlite3.Connection (GH-25998)
https://github.com/python/cpython/commit/243b6c3b8fd3144450c477d99f01e31e7c3ebc0f
msg399888 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-08-19 07:03
Thanks Pablo for merging, and Berker for pushing me to investigate further. Highly appreciated!
History
Date User Action Args
2021-08-19 07:03:39erlendaaslandsetmessages: + msg399888
2021-08-18 23:38:02pablogsalsetnosy: + pablogsal
messages: + msg399875
2021-08-18 23:38:02pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-08-13 20:47:05erlendaaslandsetmessages: + msg399559
2021-06-01 10:29:38erlendaaslandsetmessages: + msg394848
2021-05-24 22:19:58erlendaaslandsetmessages: + msg394269
2021-05-19 09:53:39erlendaaslandsetmessages: + msg393942
2021-05-19 08:39:29berker.peksagsettype: resource usage -> enhancement
messages: + msg393936
2021-05-08 20:12:13erlendaaslandsettitle: [sqlite3] optimisation: remove statement weak ref list from connection object -> [sqlite3] remove superfluous statement weak ref list from connection object
2021-05-08 20:08:41erlendaaslandsetkeywords: + patch
stage: patch review
pull_requests: + pull_request24651
2021-05-08 19:48:05erlendaaslandsetmessages: + msg393283
2021-05-08 19:43:24erlendaaslandcreate