classification
Title: [sqlite3] remove legacy code from pysqlite_step
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, christian.heimes, corona10, erlendaasland, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2021-02-21 23:16 by erlendaasland, last changed 2021-02-25 23:40 by berker.peksag. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24638 merged erlendaasland, 2021-02-24 11:17
Messages (18)
msg387476 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-21 23:16
pysqlite_step() contains a NULL check needed for SQLite 3.5 and earlier. This can be removed.
msg387619 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-24 11:08
From the SQLite 3.5.3 changelog:
- sqlite3_step() returns SQLITE_MISUSE instead of crashing when called with a NULL parameter.
msg387623 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-02-24 15:02
Hmm by the way the current implementation returns SQLITE_OK if the statement is NULL, but it looks like return SQLITE_MISUSE if we apply this patch.

Does it not cause any behavior regression? if so we should add news also.
msg387627 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-24 16:02
I’ll check all uses and see if we’ve got everything covered by the test suite. This is one of the core functions of the sqlite3 module (all queries call step at least once, but often multiple times), so I’d expect coverage is pretty good.
msg387639 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-24 21:09
There are six users of pysqlite_step():

$ grep -nrE "\<pysqlite_step\>" Modules/_sqlite
Modules/_sqlite/util.c:27:int pysqlite_step(sqlite3_stmt* statement, pysqlite_Connection* connection)
Modules/_sqlite/connection.c:393:    rc = pysqlite_step(statement, self);
Modules/_sqlite/connection.c:442:        rc = pysqlite_step(statement, self);
Modules/_sqlite/connection.c:493:        rc = pysqlite_step(statement, self);
Modules/_sqlite/util.h:32:int pysqlite_step(sqlite3_stmt* statement, pysqlite_Connection* connection);
Modules/_sqlite/cursor.c:519:        rc = pysqlite_step(self->statement->st, self->connection);
Modules/_sqlite/cursor.c:715:            rc = pysqlite_step(statement, self->connection);
Modules/_sqlite/cursor.c:787:        rc = pysqlite_step(self->statement->st, self->connection);

The three users in Modules/_sqlite/connection.c — _pysqlite_connection_begin(), pysqlite_connection_commit_impl(), and pysqlite_connection_rollback_impl() – are all ok, following this pattern:
1) prepare the statement
2) verify that prepare was successful, bail if not
3) call step

pysqlite_cursor_executescript() (line 715 in Modules/_sqlite/cursor.c) is also ok:
1) prepare the statement
2) verify that prepare was successful, bail if not
3) call step until there are no more rows

I need a little bit more time to verify _pysqlite_query_execute() and pysqlite_cursor_iternext().



Note to self: pysqlite_cursor_executescript() calls sqlite3_finalize() three times. It would have been better to break out of the loop when rc != SQLITE_ROW, immediately call sqlite3_finalize() (error code is preserved if sqlite3_step() failed), and then check rc and PyErr_Occurred(). It will make the code easier to follow, IMO.
msg387641 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-24 22:52
pysqlite_cursor_iternext() has four users:
- sqlite3.Cursor.fetchone()
- sqlite3.Cursor.fetchall()
- sqlite3.Cursor.fetchmany()
- sqlite3.Cursor.__next__()

All of these methods pass self to pysqlite_cursor_iternext().

pysqlite_cursor_iternext() starts by checking the state of the cursor (is it initialised, it the database open, etc.). If there is a Statement object, pysqlite_step() is called with the sqlite3_stmt pointer from the Statement object (Cursor->statement->st).

The statement pointer of the Cursor object is set in _pysqlite_query_execute() – the last pysqlite_step() user – either from the LRU cache (line 470), or by creating a new Statement object (line 479). The latter only leaves a valid Cursor->statement->st pointer (sqlite3_stmt pointer) if the Statement object was successfully created, and the sqlite3_stmt successfully prepared. (I assume only valid Statement objects are added to the cache.) Before the main loop of _pysqlite_query_execute() starts, the statement is reset. In the loop, the next parameter set is fetched, the statement is (re)bound, and step is called. If Cursor.execute() called _pysqlite_query_execute(), the parameter list is initialised to a single-item list, and the loop is only run once. From what I can read, this function is also safe. (But it is very messy; for instance, if there's an active Statement, it is reset twice before the loop starts.)

I tried forcing an error by using an uninitialised cursor:
>>> cx = sqlite3.connect(":memory:")
>>> cu = sqlite3.Cursor.__new__(sqlite3.Cursor)
>>> sqlite3.Cursor.fetchone(cu)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: Base Cursor.__init__ not called.
>>> next(cu)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.ProgrammingError: Base Cursor.__init__ not called.

Ditto for fetchmany() and fetchall(). This is consistent with current behaviour.

Calling fetch*() without first executing a statement:
>>> cu = cx.cursor()
>>> cu.fetchone()
>>> cu.fetchmany()
[]
>>> cu.fetchall()
[]
>>> next(cu)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

This is consistent with current behaviour.



I might have missed something, but from what I can see, there are no paths that lead to pysqlite_step() being called with a NULL pointer.

Berker, Serhiy, please correct me if I'm wrong.
msg387643 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-24 23:23
Regarding test coverage:

_pysqlite_query_execute() has one coverage gap: lines 478-488 (if (self->statement->in_use)) are never executed. Except from that, the only missing are the hard-to-trigger goto error's (for example PyList_New and PyList_Append failures). I'm not sure if it's possible to trigger this, but I cannot see that it is relevant for our case here.

Coverage for the other users of pysqlite_step is as complete as it can be; only the standard corner error cases are missing.
msg387648 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-25 08:08
> _pysqlite_query_execute() has one coverage gap: lines 478-488 (if (self->statement->in_use)) are never executed.

I wonder if it is possible at all to reach this branch. If it is not, then I'm pretty sure Cursor.in_use is redundant, and all code relating to it can be removed.
msg387657 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-25 11:12
> I wonder if it is possible at all to reach this branch. If it is not, then I'm pretty sure Cursor.in_use is redundant

Typo: should be Statement.in_use

Corner error cases may cause the _pysqlite_query_execute() loop to exit without pysqlite_statement_reset() being called, thus leaving Statement.in_use == 1, but when _pysqlite_query_execute() is called again and if there's an active statement, pysqlite_statement_reset() will reset in_use to zero, before we ever reach the if (self->statement->in_use) { ... } statement.

I can open a separate issue for considering removing Statement.in_use.


> (I assume only valid Statement objects are added to the cache.)

AFAICS, this is true.


We can wait for Berker and/or Serhiy to confirm/correct my assumptions and findings.
msg387662 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-02-25 12:07
Look at issue in which the workaround was added. Does it contain some examples?
msg387663 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-25 12:08
> Look at issue in which the workaround was added. Does it contain some examples?

I'll check. Thanks.
msg387664 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-25 12:28
Introduced by Christian Heimes 2007-12-14 in commit 380532117c2547bb0dedf6f85efa66d18a9abb88, which is a merge from SVN (?) checkin r59471 by Gerhard Häring (https://mail.python.org/pipermail/python-checkins/2007-December/063968.html)

This corresponds to https://github.com/ghaering/pysqlite/commit/61b3cd9381750bdd96f8f8fc2c1b19f1dc4acef7, with a simple test added two commits later, https://github.com/ghaering/pysqlite/commit/7b0faed4ababbf1053caa2f5b2efc15929f66c4f That test was added to CPython in 2008-03-29 by Gerhard in commit e7ea7451a84636655927da4b9731d2eb37d1cf34, and it's still here.
msg387665 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-02-25 12:32
Back in the day I was of several core devs that took care of syncing code between Python 2 and 3 branches with a tool called "svnmerge". Commit 380532117c2547bb0dedf6f85efa66d18a9abb88 is a svnmerge commit. The tool synced changesets in batches, which makes it harder to bisect changes.
msg387668 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-25 12:58
Ah, at last I found the source of the confusion: The SQLite changelog.

Quoting from msg387619 and https://sqlite.org/changes.html:

> From the SQLite 3.5.3 changelog:
> - sqlite3_step() returns SQLITE_MISUSE instead of crashing when called with a NULL parameter.

I assumed this was correct without even trying it. This short snippet shows something else:

    int rc = sqlite3_reset(NULL);
    printf("reset with NULL: %d %s\n", rc, sqlite3_errstr(rc));

$ ./a.out
reset with NULL: 0 not an error


Gerhard's comment was right and the workaround was right. I'll adjust the comment.

Dong-hee Na:
> Hmm by the way the current implementation returns SQLITE_OK if the statement is NULL, but it looks like return SQLITE_MISUSE if we apply this patch.
> Does it not cause any behavior regression? if so we should add news also.

Behaviour stays the same; no regressions introduced. I learned a lot about the sqlite3 module, and I relearned I should not trust changelogs/documentation without trying stuff myself first.

I'll adjust the erroneous comment and re-request a review, Dong-hee.
msg387669 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-25 13:09
>     int rc = sqlite3_reset(NULL);
>    printf("reset with NULL: %d %s\n", rc, sqlite3_errstr(rc));

Sorry, wrong test:

    int rc = sqlite3_step(NULL);
    printf("step with NULL: %d %s\n", rc, sqlite3_errstr(rc));

$ ./a.out
step with NULL: 21 bad parameter or other API misuse
msg387670 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-25 13:22
msg387668 was a little bit hasty. I'll try again:

Dong-hee Na:
> Hmm by the way the current implementation returns SQLITE_OK if the statement is NULL, but it looks like return SQLITE_MISUSE if we apply this patch.
> Does it not cause any behavior regression? if so we should add news also.

No, behaviour still stays the same; no regressions introduced. The bug is triggered by executing an empty statement. This will pass an empty statement to pysqlite_step() in line 519 of Modules/_sqlite/cursor.c. Earlier, this returned SQLITE_OK. sqlite3_column_count(NULL) returns 0, so we slide through the rest of the loop without much happening. Now, pysqlite_step() returns SQLITE_MISUSE, which only results in the statement being reset in line 529, and the error cleared in line 530. Then we bail out of the loop.

So, the current comment is correct, the SQLite changelog was correct, the workaround and old comment was wrong.

I'm done :)
msg387688 - (view) Author: Erlend Egeberg Aasland (erlendaasland) * Date: 2021-02-25 21:43
Addendum to msg387641:
> The latter only leaves a valid Cursor->statement->st pointer (sqlite3_stmt pointer) if the Statement object was successfully created, and the sqlite3_stmt successfully prepared.

sqlite3_prepare_v2() actually returns SQLITE_OK but sets the statement pointer to NULL, if given an empty string or a comment. Only the sqlite3_prepare_v2() return code is checked in order to determine if pysqlite_statement_create() was successful or not.
msg387694 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2021-02-25 23:39
New changeset 91ea37c84af2dd5ea92802a4c2adad47861ac067 by Erlend Egeberg Aasland in branch 'master':
bpo-43290: Remove workaround from pysqlite_step() (GH-24638)
https://github.com/python/cpython/commit/91ea37c84af2dd5ea92802a4c2adad47861ac067
History
Date User Action Args
2021-02-25 23:40:16berker.peksagsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-02-25 23:39:48berker.peksagsetmessages: + msg387694
2021-02-25 21:43:57erlendaaslandsetmessages: + msg387688
2021-02-25 13:22:09erlendaaslandsetmessages: + msg387670
2021-02-25 13:09:19erlendaaslandsetmessages: + msg387669
2021-02-25 12:58:16erlendaaslandsetmessages: + msg387668
2021-02-25 12:32:39christian.heimessetnosy: + christian.heimes
messages: + msg387665
2021-02-25 12:28:23erlendaaslandsetmessages: + msg387664
2021-02-25 12:08:37erlendaaslandsetmessages: + msg387663
2021-02-25 12:07:53serhiy.storchakasetmessages: + msg387662
2021-02-25 11:12:54erlendaaslandsetmessages: + msg387657
2021-02-25 08:08:45erlendaaslandsetmessages: + msg387648
2021-02-24 23:23:25erlendaaslandsetmessages: + msg387643
2021-02-24 22:52:56erlendaaslandsetmessages: + msg387641
2021-02-24 21:09:34erlendaaslandsetmessages: + msg387639
2021-02-24 16:02:23erlendaaslandsetmessages: + msg387627
2021-02-24 15:02:47corona10setnosy: + corona10
messages: + msg387623
2021-02-24 11:17:42erlendaaslandsetkeywords: + patch
stage: patch review
pull_requests: + pull_request23423
2021-02-24 11:08:07erlendaaslandsetmessages: + msg387619
2021-02-21 23:16:09erlendaaslandcreate