classification
Title: [sqlite3] move set lastrowid out of the query loop and enable it for executemany()
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.11
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: erlendaasland Nosy List: corona10, erlendaasland, felixxm, lemburg
Priority: normal Keywords: patch

Created on 2022-01-03 21:14 by erlendaasland, last changed 2022-01-22 09:47 by corona10. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 30371 closed erlendaasland, 2022-01-03 21:15
PR 30380 closed erlendaasland, 2022-01-04 00:01
PR 30489 merged erlendaasland, 2022-01-08 21:13
Messages (19)
msg409620 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-03 21:14
The query loop in _pysqlite_query_execute() is run only once for ordinary execute()'s, but multiple times for executemany(). We use the 'multiple' variable to differ between the two execute methods; for execute(), multiple is false, for executemany(), multiple is true. At the end of the loop, the 'lastrowid' connection attribute is set, if multiple is false. We can safely move this part out of the loop; it is irrelevant for executemany(), and it will only be run once for execute().
msg409642 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-03 23:49
I see that PEP 249 does not define the semantics of lastrowid for executemany(). What's the precedence here, MAL? IMO, it would be nice to be able to fetch the last row id after you've done a 1000 inserts using executemany().

So, another option would be to keep "set-lastrowid" in the query loop, and just remove the condition; we set it every time (but of course only build a PyObject of it when the loop is done).
msg409643 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-04 00:08
> So, another option would be to keep "set-lastrowid" in the query loop, and
> just remove the condition; we set it every time (but of course only build a
> PyObject of it when the loop is done).

I made a competing PR (GH-30380) for this, just a little better (IMO); the Py object is only built on demand :)
msg409660 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2022-01-04 09:48
On 04.01.2022 00:49, Erlend E. Aasland wrote:
> 
> Erlend E. Aasland <erlend.aasland@innova.no> added the comment:
> 
> I see that PEP 249 does not define the semantics of lastrowid for executemany(). What's the precedence here, MAL? IMO, it would be nice to be able to fetch the last row id after you've done a 1000 inserts using executemany().

The DB-API deliberately leaves this undefined, since there are many ways you
could implement this, e.g.

- return the last row id for the last entry in the array passed to .executemany()
- return the last row id of the last actually modified/inserted row after
running .executemany()
- return an array of row ids, one for each row modified/inserted
- return a row id of one of the modified/inserted rows, without defining which
- always return None for .executemany()

Note that in some cases, the order of actions taken by the database is not
predefined (e.g. some databases run such inserts in chunks across
a cluster), so even the "last" semantics are not clear.

> So, another option would be to keep "set-lastrowid" in the query loop, and just remove the condition; we set it every time (but of course only build a PyObject of it when the loop is done).

Since the DB-API leaves this undefined, you are free to add your own
meaning, which makes most sense for SQLite, to always return None or
not implement it at all.

DB-API extensions such as Cursor.lastrowid are optional and don't need
to be implemented if they don't make sense for a particular use case:

https://www.python.org/dev/peps/pep-0249/#optional-db-api-extensions
msg409661 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-04 10:02
Thank you for your input Marc-André.

For SQLite, it's pretty simple: we use an API called sqlite3_last_insert_rowid() which takes the database connection as it's argument, not a statement pointer. This function returns "the rowid of the most recent successful INSERT into a rowid table or virtual table on database connection" (quote from SQLite docs). IMO, it would make sense to also use this post executemany().
msg409668 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2022-01-04 10:45
On 04.01.2022 11:02, Erlend E. Aasland wrote:
> 
> Erlend E. Aasland <erlend.aasland@innova.no> added the comment:
> 
> Thank you for your input Marc-André.
> 
> For SQLite, it's pretty simple: we use an API called sqlite3_last_insert_rowid() which takes the database connection as it's argument, not a statement pointer. This function returns "the rowid of the most recent successful INSERT into a rowid table or virtual table on database connection" (quote from SQLite docs). IMO, it would make sense to also use this post executemany().

Sounds like a plan.

If possible, it's usually better to have the .executemany() create a
cursor with an output array providing the row ids, e.g. using "INSERT ...
RETURNING ..." (PostgreSQL). That way you can access all row ids and
can also provide the needed detail in case the INSERTs happen out of
order to map them to the input data.

For cases where you don't need sequence IDs, it's often better to
not rely on auto-increment columns for IDs, but instead use random
pre-generated IDs. Saves roundtrips to the database and works nicely
with cluster databases as well.
msg409706 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-04 20:02
> If possible, it's usually better to have the .executemany() create a
> cursor with an output array providing the row ids, e.g. using "INSERT ...
> RETURNING ..." (PostgreSQL). That way you can access all row ids and
> can also provide the needed detail in case the INSERTs happen out of
> order to map them to the input data.

Hm, maybe so. But it will add a lot of overhead and complexity to executemany(), and there haven't been requests for this feature for sqlite3. AFAIK, there hasn't been request for lastrowid for executemany() at all. OTOH, my proposal of modifying lastrowid to always show the rowid of the actual last inserted row is a very cheap operation, _and_ it simplifies the code (=> increased maintainability), so I think I'll go for that.

> For cases where you don't need sequence IDs, it's often better to
> not rely on auto-increment columns for IDs, but instead use random
> pre-generated IDs. Saves roundtrips to the database and works nicely
> with cluster databases as well.

Yes, but in those cases you keep track of the row id yourself, so you probably won't need lastrowid ;)
msg409707 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2022-01-04 20:23
On 04.01.2022 21:02, Erlend E. Aasland wrote:
> 
> Erlend E. Aasland <erlend.aasland@innova.no> added the comment:
> 
>> If possible, it's usually better to have the .executemany() create a
>> cursor with an output array providing the row ids, e.g. using "INSERT ...
>> RETURNING ..." (PostgreSQL). That way you can access all row ids and
>> can also provide the needed detail in case the INSERTs happen out of
>> order to map them to the input data.
> 
> Hm, maybe so. But it will add a lot of overhead and complexity to executemany(), and there haven't been requests for this feature for sqlite3. AFAIK, there hasn't been request for lastrowid for executemany() at all. OTOH, my proposal of modifying lastrowid to always show the rowid of the actual last inserted row is a very cheap operation, _and_ it simplifies the code (=> increased maintainability), so I think I'll go for that.

Sorry, I wasn't suggesting this for SQLite; it's just a better
and more flexible option than using cursor.lastrowid where
available. Sadly, the PG extension is not standards conform SQL.

>> For cases where you don't need sequence IDs, it's often better to
>> not rely on auto-increment columns for IDs, but instead use random
>> pre-generated IDs. Saves roundtrips to the database and works nicely
>> with cluster databases as well.
> 
> Yes, but in those cases you keep track of the row id yourself, so you probably won't need lastrowid ;)

Indeed, and that's the point :-)

Those auto-increment ID fields are
not really such a good idea to begin with. Either you know that you
will need to manipulate the rows after inserting them (in which case
you can set an ID) or you don't care about the individual rows and
only want to aggregate or search in them based on other fields.
msg409708 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-04 20:27
True that :) I'll close GH-30371 and prepare GH-30380 for review. I'll request your review if you don't mind.
msg409712 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-04 21:27
There are some inaccuracies in the docs I need to fix first, since those changes must be backported, and I want the updated docs applied upon the corrected docs; I'll open a separate issue/PR for that.
msg410116 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-08 20:56
Marc-André: since Python 3.6, the sqlite3.Cursor.lastrowid attribute does no longer comply with the recommendations of PEP 249:

Previously, lastrowid was set to None for operations other than INSERT or REPLACE. This changed with ab994ed8b97e1b0dac151ec827c857f5e7277565 (in Python 3.6), so that lastrowid is _unchanged_ for operations other than INSERT or REPLACE, and it is set to 0 after the first valid SQL (that is not INSERT/REPLACE) is executed on the cursor.

Now, PEP 249 only _recommends_ that lastrowid is set to None for operations that do not modify a row, so it's probably not a big deal. No-one has ever mentioned this change in behaviour; there have been no bug reports.

FTR, here is the relevant quote from PEP 249:

    If the operation does not set a rowid or if the database does not support
    rowids, this attribute should be set to None.

(I interpret "should" as understood by RFC 2119.)

So, my follow-up question becomes:
I see no point in reverting to pre Python 3.6 behaviour. I would rather change the default value to be 0 (to get rid of the dirty flag in GH-30380), and to make the behaviour more consistent with how the actual SQLite API behaves.


Do you have an opinion about such a change (in behaviour)?
msg410117 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-08 21:12
I see now that GH-30380 has grown a little bit out of scope, as it changes too many things at once:

1. Simplify the `execute()`/`executemany()` query loop
2. Create the lastrowid PyObject on demand, simplifying cursor GC
3. `lastrowid` is now available for all `execute*()` methods
4. The default value of `lastrowid` is now `0`

It will be easier to reason about and review each change separately.
msg410305 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2022-01-11 15:04
On 08.01.2022 21:56, Erlend E. Aasland wrote:
>  
> Marc-André: since Python 3.6, the sqlite3.Cursor.lastrowid attribute does no longer comply with the recommendations of PEP 249:
> 
> Previously, lastrowid was set to None for operations other than INSERT or REPLACE. This changed with ab994ed8b97e1b0dac151ec827c857f5e7277565 (in Python 3.6), so that lastrowid is _unchanged_ for operations other than INSERT or REPLACE, and it is set to 0 after the first valid SQL (that is not INSERT/REPLACE) is executed on the cursor.
> 
> Now, PEP 249 only _recommends_ that lastrowid is set to None for operations that do not modify a row, so it's probably not a big deal. No-one has ever mentioned this change in behaviour; there have been no bug reports.
> 
> FTR, here is the relevant quote from PEP 249:
> 
>     If the operation does not set a rowid or if the database does not support
>     rowids, this attribute should be set to None.
> 
> (I interpret "should" as understood by RFC 2119.)

Well, it may be a little stronger than the SHOULD in the RFC, but then
again the whole DB-API is about conventions and if they don't make sense
for a database backend, it is possible to deviate from the spec, esp. for
optional extensions such as .lastrowid.

> So, my follow-up question becomes:
> I see no point in reverting to pre Python 3.6 behaviour. I would rather change the default value to be 0 (to get rid of the dirty flag in GH-30380), and to make the behaviour more consistent with how the actual SQLite API behaves.
> 
> 
> Do you have an opinion about such a change (in behaviour)?

Is 0 a valid row ID in SQLite ? If not, then I guess this would be
an alternative to None as suggested by the DB-API.

If it is a valid row ID, I'd suggest to go back to resetting to None,
since otherwise code might get confused: if an UPDATE does not get
applied (e.g. a condition is false), code could then still take
.lastrowid as referring to the UPDATE and not a previous
operation, since code will now know whether the condition was met
or not.
 --
Marc-Andre Lemburg
eGenix.com
msg410323 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-11 19:46
> Is 0 a valid row ID in SQLite ? If not, then I guess this would be
> an alternative to None as suggested by the DB-API.

Yes. Any 64 bit signed integer value is a valid row id in SQLite.

> If it is a valid row ID, I'd suggest to go back to resetting to None,
> since otherwise code might get confused: if an UPDATE does not get
> applied (e.g. a condition is false), code could then still take
> .lastrowid as referring to the UPDATE and not a previous
> operation, since code will now know whether the condition was met
> or not.

True. OTOH, we've had no requests for reverting to pre Python 3.6 behaviour.

If we are to revert to this behaviour, we'll have to start examining the SQL we are given (search for INSERT and REPLACE keywords, determine if they are valid (i.e. not a comment, not part of a column or table name, etc.), which will lead to a noticeable performance hit for every new statement (not for statements reused via the LRU cache though). I'm not sure this is a good idea. However I will give it a good thought.

My first thought now, is that it would be better for the sqlite3 module to align lastrowid with the behaviour of the C API sqlite3_last_insert_rowid() (also available as an SQL function: last_insert_rowid). OTOH, the SQLite API is tied to the _connection_ object, so it may not make sense to align it with lastrowid which is a _cursor_ attribute.

Perhaps the Right Thing To Do™ is to be conservative and just leave it as it is. I still want to apply the optimisation, though. It does not alter the behaviour in any kind of way, and it speeds up executemany().
msg410324 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-11 19:51
> OTOH, the SQLite API is tied to the _connection_ object, so it may not make sense to align it with lastrowid which is a _cursor_ attribute.

That came out weird; let's try again:

The SQLite API is tied to the _connection_ object, so it may not make sense to force that behaviour onto lastrowid which is a _cursor_ attribute.
msg410326 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2022-01-11 20:17
On 11.01.2022 20:46, Erlend E. Aasland wrote:
> 
> If we are to revert to this behaviour, we'll have to start examining the SQL we are given (search for INSERT and REPLACE keywords, determine if they are valid (i.e. not a comment, not part of a column or table name, etc.), which will lead to a noticeable performance hit for every new statement (not for statements reused via the LRU cache though). I'm not sure this is a good idea. However I will give it a good thought.
>
> My first thought now, is that it would be better for the sqlite3 module to align lastrowid with the behaviour of the C API sqlite3_last_insert_rowid() (also available as an SQL function: last_insert_rowid). OTOH, the SQLite API is tied to the _connection_ object, so it may not make sense to align it with lastrowid which is a _cursor_ attribute.

I've had a look at the API description and find it less than useful,
to be honest:

https://sqlite.org/c3ref/last_insert_rowid.html

You don't know on which cursor the last row was inserted, it's
possible that this was or is done by a trigger and the last row
is not updated in case the INSERT does not succeed for some reason,
leaving it unchanged - without the user getting a notification of
this failure, since the .execute() call itself will succeed for
e.g. "INSERT INTO table SELECT ...;".

It also seems that the function really only works for INSERTs and
not for UPDATEs.

> Perhaps the Right Thing To Do™ is to be conservative and just leave it as it is. I still want to apply the optimisation, though. It does not alter the behaviour in any kind of way, and it speeds up executemany().

I'd suggest to deprecate the cursor.lastrowid attribute and
instead point people to the much more useful

"INSERT INTO t (name) VALUES ('two'), ('three') RETURNING ROWID;"

https://sqlite.org/lang_insert.html
https://sqlite.org/forum/forumpost/058ac49cc3

(good to know that SQLite has adopted this PostgreSQL variant as
well)

RETURNING is also available for UPDATES:

https://sqlite.org/lang_update.html

If people really want to use the sqlite3_last_insert_rowid()
functionality, they can use the SQL function of the same name:

https://www.sqlite.org/lang_corefunc.html#last_insert_rowid

which then has known semantics and doesn't conflict with the DB-API
specs.

But this is your call :-)
msg410327 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-11 20:30
> You don't know on which cursor the last row was inserted [...]

SQLite has no concept of cursors :)

> It also seems that the function really only works for INSERTs and
> not for UPDATEs.

Yes, hence it's name: sqlite3_last_insert_rowid() !

> I'd suggest to deprecate the cursor.lastrowid attribute and
> instead point people to the much more useful [...]

Yes, I think mentioning the RETURNING ROWID trick in the sqlite3 docs is a very nice improvement. Mentioning the last_insert_rowid SQL function is probably also worth consideration.

I'm reluctant to deprecate cursor.lastrowid, though. ATM, I'm leaning towards just keeping the current behaviour.

> But this is your call :-)

I hear you are saying that, but strictly speaking I'm _not_ the sqlite3 module maintainer; I'm just a very eager contributor :)
msg410396 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2022-01-12 13:56
On 11.01.2022 21:30, Erlend E. Aasland wrote:
> 
>> I'd suggest to deprecate the cursor.lastrowid attribute and
>> instead point people to the much more useful [...]
> 
> Yes, I think mentioning the RETURNING ROWID trick in the sqlite3 docs is a very nice improvement. Mentioning the last_insert_rowid SQL function is probably also worth consideration.
> 
> I'm reluctant to deprecate cursor.lastrowid, though. ATM, I'm leaning towards just keeping the current behaviour.

Fair enough :-)

Perhaps just documenting that the value is not necessarily what
people may expect, when coming from other databases due to the
different semantics with SQLite, is enough.
 --
Marc-Andre Lemburg
eGenix.com
msg411242 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2022-01-22 09:40
New changeset 38afeb1a336f0451c0db86df567ef726f49f6438 by Erlend Egeberg Aasland in branch 'main':
bpo-46249: Move set lastrowid out of the sqlite3 query loop (GH-30489)
https://github.com/python/cpython/commit/38afeb1a336f0451c0db86df567ef726f49f6438
History
Date User Action Args
2022-01-22 09:47:38corona10setstatus: open -> closed
stage: patch review -> resolved
2022-01-22 09:40:29corona10setnosy: + corona10
messages: + msg411242
2022-01-12 13:56:10lemburgsetmessages: + msg410396
2022-01-11 20:30:03erlendaaslandsetmessages: + msg410327
2022-01-11 20:17:21lemburgsetmessages: + msg410326
2022-01-11 19:51:42erlendaaslandsetmessages: + msg410324
2022-01-11 19:46:51erlendaaslandsetmessages: + msg410323
2022-01-11 15:04:07lemburgsetmessages: + msg410305
title: [sqlite3] lastrowid improvements -> [sqlite3] move set lastrowid out of the query loop and enable it for executemany()
2022-01-09 19:14:39felixxmsetnosy: + felixxm
2022-01-08 21:31:36erlendaaslandsettitle: [sqlite3] move set lastrowid out of the query loop and enable it for executemany() -> [sqlite3] lastrowid improvements
2022-01-08 21:13:25erlendaaslandsetpull_requests: + pull_request28692
2022-01-08 21:12:12erlendaaslandsetmessages: + msg410117
2022-01-08 20:56:53erlendaaslandsetmessages: + msg410116
2022-01-04 21:27:23erlendaaslandsetmessages: + msg409712
2022-01-04 20:28:27erlendaaslandsettitle: [sqlite3] move set lastrowid out of the query loop -> [sqlite3] move set lastrowid out of the query loop and enable it for executemany()
2022-01-04 20:27:53erlendaaslandsetmessages: + msg409708
2022-01-04 20:23:02lemburgsetmessages: + msg409707
2022-01-04 20:02:26erlendaaslandsetmessages: + msg409706
2022-01-04 10:45:50lemburgsetmessages: + msg409668
2022-01-04 10:02:55erlendaaslandsetmessages: + msg409661
2022-01-04 09:48:54lemburgsetmessages: + msg409660
2022-01-04 00:08:17erlendaaslandsetmessages: + msg409643
2022-01-04 00:01:27erlendaaslandsetpull_requests: + pull_request28590
2022-01-03 23:49:11erlendaaslandsetnosy: + lemburg
messages: + msg409642
2022-01-03 21:15:49erlendaaslandsetkeywords: + patch
stage: patch review
pull_requests: + pull_request28583
2022-01-03 21:14:27erlendaaslandcreate