Issue 46249: [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/90407

classification

Title:	[sqlite3] move set lastrowid out of the query loop and enable it for executemany()
Type:	enhancement	Stage:	resolved
Components:	Extension Modules	Versions:	Python 3.11

process

Status:	closed	Resolution:
Dependencies:		Superseder:
Assigned To:	erlendaasland	Nosy List:	corona10, erlendaasland, felixxm, lemburg
Priority:	normal	Keywords:	patch

Created on 2022-01-03 21:14 by erlendaasland, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 30371	closed	erlendaasland, 2022-01-03 21:15
PR 30380	closed	erlendaasland, 2022-01-04 00:01
PR 30489	merged	erlendaasland, 2022-01-08 21:13

Messages (19)
msg409620 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-03 21:14
The query loop in _pysqlite_query_execute() is run only once for ordinary execute()'s, but multiple times for executemany(). We use the 'multiple' variable to differ between the two execute methods; for execute(), multiple is false, for executemany(), multiple is true. At the end of the loop, the 'lastrowid' connection attribute is set, if multiple is false. We can safely move this part out of the loop; it is irrelevant for executemany(), and it will only be run once for execute().
msg409642 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-03 23:49
I see that PEP 249 does not define the semantics of lastrowid for executemany(). What's the precedence here, MAL? IMO, it would be nice to be able to fetch the last row id after you've done a 1000 inserts using executemany(). So, another option would be to keep "set-lastrowid" in the query loop, and just remove the condition; we set it every time (but of course only build a PyObject of it when the loop is done).
msg409643 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-04 00:08
> So, another option would be to keep "set-lastrowid" in the query loop, and > just remove the condition; we set it every time (but of course only build a > PyObject of it when the loop is done). I made a competing PR (GH-30380) for this, just a little better (IMO); the Py object is only built on demand :)
msg409660 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2022-01-04 09:48
On 04.01.2022 00:49, Erlend E. Aasland wrote: > > Erlend E. Aasland <erlend.aasland@innova.no> added the comment: > > I see that PEP 249 does not define the semantics of lastrowid for executemany(). What's the precedence here, MAL? IMO, it would be nice to be able to fetch the last row id after you've done a 1000 inserts using executemany(). The DB-API deliberately leaves this undefined, since there are many ways you could implement this, e.g. - return the last row id for the last entry in the array passed to .executemany() - return the last row id of the last actually modified/inserted row after running .executemany() - return an array of row ids, one for each row modified/inserted - return a row id of one of the modified/inserted rows, without defining which - always return None for .executemany() Note that in some cases, the order of actions taken by the database is not predefined (e.g. some databases run such inserts in chunks across a cluster), so even the "last" semantics are not clear. > So, another option would be to keep "set-lastrowid" in the query loop, and just remove the condition; we set it every time (but of course only build a PyObject of it when the loop is done). Since the DB-API leaves this undefined, you are free to add your own meaning, which makes most sense for SQLite, to always return None or not implement it at all. DB-API extensions such as Cursor.lastrowid are optional and don't need to be implemented if they don't make sense for a particular use case: https://www.python.org/dev/peps/pep-0249/#optional-db-api-extensions
msg409661 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-04 10:02
Thank you for your input Marc-André. For SQLite, it's pretty simple: we use an API called sqlite3_last_insert_rowid() which takes the database connection as it's argument, not a statement pointer. This function returns "the rowid of the most recent successful INSERT into a rowid table or virtual table on database connection" (quote from SQLite docs). IMO, it would make sense to also use this post executemany().
msg409668 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2022-01-04 10:45
On 04.01.2022 11:02, Erlend E. Aasland wrote: > > Erlend E. Aasland <erlend.aasland@innova.no> added the comment: > > Thank you for your input Marc-André. > > For SQLite, it's pretty simple: we use an API called sqlite3_last_insert_rowid() which takes the database connection as it's argument, not a statement pointer. This function returns "the rowid of the most recent successful INSERT into a rowid table or virtual table on database connection" (quote from SQLite docs). IMO, it would make sense to also use this post executemany(). Sounds like a plan. If possible, it's usually better to have the .executemany() create a cursor with an output array providing the row ids, e.g. using "INSERT ... RETURNING ..." (PostgreSQL). That way you can access all row ids and can also provide the needed detail in case the INSERTs happen out of order to map them to the input data. For cases where you don't need sequence IDs, it's often better to not rely on auto-increment columns for IDs, but instead use random pre-generated IDs. Saves roundtrips to the database and works nicely with cluster databases as well.
msg409706 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-04 20:02
> If possible, it's usually better to have the .executemany() create a > cursor with an output array providing the row ids, e.g. using "INSERT ... > RETURNING ..." (PostgreSQL). That way you can access all row ids and > can also provide the needed detail in case the INSERTs happen out of > order to map them to the input data. Hm, maybe so. But it will add a lot of overhead and complexity to executemany(), and there haven't been requests for this feature for sqlite3. AFAIK, there hasn't been request for lastrowid for executemany() at all. OTOH, my proposal of modifying lastrowid to always show the rowid of the actual last inserted row is a very cheap operation, _and_ it simplifies the code (=> increased maintainability), so I think I'll go for that. > For cases where you don't need sequence IDs, it's often better to > not rely on auto-increment columns for IDs, but instead use random > pre-generated IDs. Saves roundtrips to the database and works nicely > with cluster databases as well. Yes, but in those cases you keep track of the row id yourself, so you probably won't need lastrowid ;)
msg409707 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2022-01-04 20:23
On 04.01.2022 21:02, Erlend E. Aasland wrote: > > Erlend E. Aasland <erlend.aasland@innova.no> added the comment: > >> If possible, it's usually better to have the .executemany() create a >> cursor with an output array providing the row ids, e.g. using "INSERT ... >> RETURNING ..." (PostgreSQL). That way you can access all row ids and >> can also provide the needed detail in case the INSERTs happen out of >> order to map them to the input data. > > Hm, maybe so. But it will add a lot of overhead and complexity to executemany(), and there haven't been requests for this feature for sqlite3. AFAIK, there hasn't been request for lastrowid for executemany() at all. OTOH, my proposal of modifying lastrowid to always show the rowid of the actual last inserted row is a very cheap operation, _and_ it simplifies the code (=> increased maintainability), so I think I'll go for that. Sorry, I wasn't suggesting this for SQLite; it's just a better and more flexible option than using cursor.lastrowid where available. Sadly, the PG extension is not standards conform SQL. >> For cases where you don't need sequence IDs, it's often better to >> not rely on auto-increment columns for IDs, but instead use random >> pre-generated IDs. Saves roundtrips to the database and works nicely >> with cluster databases as well. > > Yes, but in those cases you keep track of the row id yourself, so you probably won't need lastrowid ;) Indeed, and that's the point :-) Those auto-increment ID fields are not really such a good idea to begin with. Either you know that you will need to manipulate the rows after inserting them (in which case you can set an ID) or you don't care about the individual rows and only want to aggregate or search in them based on other fields.
msg409708 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-04 20:27
True that :) I'll close GH-30371 and prepare GH-30380 for review. I'll request your review if you don't mind.
msg409712 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-04 21:27
There are some inaccuracies in the docs I need to fix first, since those changes must be backported, and I want the updated docs applied upon the corrected docs; I'll open a separate issue/PR for that.
msg410116 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-08 20:56
Marc-André: since Python 3.6, the sqlite3.Cursor.lastrowid attribute does no longer comply with the recommendations of PEP 249: Previously, lastrowid was set to None for operations other than INSERT or REPLACE. This changed with ab994ed8b97e1b0dac151ec827c857f5e7277565 (in Python 3.6), so that lastrowid is _unchanged_ for operations other than INSERT or REPLACE, and it is set to 0 after the first valid SQL (that is not INSERT/REPLACE) is executed on the cursor. Now, PEP 249 only _recommends_ that lastrowid is set to None for operations that do not modify a row, so it's probably not a big deal. No-one has ever mentioned this change in behaviour; there have been no bug reports. FTR, here is the relevant quote from PEP 249: If the operation does not set a rowid or if the database does not support rowids, this attribute should be set to None. (I interpret "should" as understood by RFC 2119.) So, my follow-up question becomes: I see no point in reverting to pre Python 3.6 behaviour. I would rather change the default value to be 0 (to get rid of the dirty flag in GH-30380), and to make the behaviour more consistent with how the actual SQLite API behaves. Do you have an opinion about such a change (in behaviour)?
msg410117 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-08 21:12
I see now that GH-30380 has grown a little bit out of scope, as it changes too many things at once: 1. Simplify the `execute()`/`executemany()` query loop 2. Create the lastrowid PyObject on demand, simplifying cursor GC 3. `lastrowid` is now available for all `execute*()` methods 4. The default value of `lastrowid` is now `0` It will be easier to reason about and review each change separately.
msg410305 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2022-01-11 15:04
On 08.01.2022 21:56, Erlend E. Aasland wrote: > > Marc-André: since Python 3.6, the sqlite3.Cursor.lastrowid attribute does no longer comply with the recommendations of PEP 249: > > Previously, lastrowid was set to None for operations other than INSERT or REPLACE. This changed with ab994ed8b97e1b0dac151ec827c857f5e7277565 (in Python 3.6), so that lastrowid is _unchanged_ for operations other than INSERT or REPLACE, and it is set to 0 after the first valid SQL (that is not INSERT/REPLACE) is executed on the cursor. > > Now, PEP 249 only _recommends_ that lastrowid is set to None for operations that do not modify a row, so it's probably not a big deal. No-one has ever mentioned this change in behaviour; there have been no bug reports. > > FTR, here is the relevant quote from PEP 249: > > If the operation does not set a rowid or if the database does not support > rowids, this attribute should be set to None. > > (I interpret "should" as understood by RFC 2119.) Well, it may be a little stronger than the SHOULD in the RFC, but then again the whole DB-API is about conventions and if they don't make sense for a database backend, it is possible to deviate from the spec, esp. for optional extensions such as .lastrowid. > So, my follow-up question becomes: > I see no point in reverting to pre Python 3.6 behaviour. I would rather change the default value to be 0 (to get rid of the dirty flag in GH-30380), and to make the behaviour more consistent with how the actual SQLite API behaves. > > > Do you have an opinion about such a change (in behaviour)? Is 0 a valid row ID in SQLite ? If not, then I guess this would be an alternative to None as suggested by the DB-API. If it is a valid row ID, I'd suggest to go back to resetting to None, since otherwise code might get confused: if an UPDATE does not get applied (e.g. a condition is false), code could then still take .lastrowid as referring to the UPDATE and not a previous operation, since code will now know whether the condition was met or not. -- Marc-Andre Lemburg eGenix.com
msg410323 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-11 19:46
> Is 0 a valid row ID in SQLite ? If not, then I guess this would be > an alternative to None as suggested by the DB-API. Yes. Any 64 bit signed integer value is a valid row id in SQLite. > If it is a valid row ID, I'd suggest to go back to resetting to None, > since otherwise code might get confused: if an UPDATE does not get > applied (e.g. a condition is false), code could then still take > .lastrowid as referring to the UPDATE and not a previous > operation, since code will now know whether the condition was met > or not. True. OTOH, we've had no requests for reverting to pre Python 3.6 behaviour. If we are to revert to this behaviour, we'll have to start examining the SQL we are given (search for INSERT and REPLACE keywords, determine if they are valid (i.e. not a comment, not part of a column or table name, etc.), which will lead to a noticeable performance hit for every new statement (not for statements reused via the LRU cache though). I'm not sure this is a good idea. However I will give it a good thought. My first thought now, is that it would be better for the sqlite3 module to align lastrowid with the behaviour of the C API sqlite3_last_insert_rowid() (also available as an SQL function: last_insert_rowid). OTOH, the SQLite API is tied to the _connection_ object, so it may not make sense to align it with lastrowid which is a _cursor_ attribute. Perhaps the Right Thing To Do™ is to be conservative and just leave it as it is. I still want to apply the optimisation, though. It does not alter the behaviour in any kind of way, and it speeds up executemany().
msg410324 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-11 19:51
> OTOH, the SQLite API is tied to the _connection_ object, so it may not make sense to align it with lastrowid which is a _cursor_ attribute. That came out weird; let's try again: The SQLite API is tied to the _connection_ object, so it may not make sense to force that behaviour onto lastrowid which is a _cursor_ attribute.
msg410326 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2022-01-11 20:17
On 11.01.2022 20:46, Erlend E. Aasland wrote: > > If we are to revert to this behaviour, we'll have to start examining the SQL we are given (search for INSERT and REPLACE keywords, determine if they are valid (i.e. not a comment, not part of a column or table name, etc.), which will lead to a noticeable performance hit for every new statement (not for statements reused via the LRU cache though). I'm not sure this is a good idea. However I will give it a good thought. > > My first thought now, is that it would be better for the sqlite3 module to align lastrowid with the behaviour of the C API sqlite3_last_insert_rowid() (also available as an SQL function: last_insert_rowid). OTOH, the SQLite API is tied to the _connection_ object, so it may not make sense to align it with lastrowid which is a _cursor_ attribute. I've had a look at the API description and find it less than useful, to be honest: https://sqlite.org/c3ref/last_insert_rowid.html You don't know on which cursor the last row was inserted, it's possible that this was or is done by a trigger and the last row is not updated in case the INSERT does not succeed for some reason, leaving it unchanged - without the user getting a notification of this failure, since the .execute() call itself will succeed for e.g. "INSERT INTO table SELECT ...;". It also seems that the function really only works for INSERTs and not for UPDATEs. > Perhaps the Right Thing To Do™ is to be conservative and just leave it as it is. I still want to apply the optimisation, though. It does not alter the behaviour in any kind of way, and it speeds up executemany(). I'd suggest to deprecate the cursor.lastrowid attribute and instead point people to the much more useful "INSERT INTO t (name) VALUES ('two'), ('three') RETURNING ROWID;" https://sqlite.org/lang_insert.html https://sqlite.org/forum/forumpost/058ac49cc3 (good to know that SQLite has adopted this PostgreSQL variant as well) RETURNING is also available for UPDATES: https://sqlite.org/lang_update.html If people really want to use the sqlite3_last_insert_rowid() functionality, they can use the SQL function of the same name: https://www.sqlite.org/lang_corefunc.html#last_insert_rowid which then has known semantics and doesn't conflict with the DB-API specs. But this is your call :-)
msg410327 - (view)	Author: Erlend E. Aasland (erlendaasland) *	Date: 2022-01-11 20:30
> You don't know on which cursor the last row was inserted [...] SQLite has no concept of cursors :) > It also seems that the function really only works for INSERTs and > not for UPDATEs. Yes, hence it's name: sqlite3_last_insert_rowid() ! > I'd suggest to deprecate the cursor.lastrowid attribute and > instead point people to the much more useful [...] Yes, I think mentioning the RETURNING ROWID trick in the sqlite3 docs is a very nice improvement. Mentioning the last_insert_rowid SQL function is probably also worth consideration. I'm reluctant to deprecate cursor.lastrowid, though. ATM, I'm leaning towards just keeping the current behaviour. > But this is your call :-) I hear you are saying that, but strictly speaking I'm _not_ the sqlite3 module maintainer; I'm just a very eager contributor :)
msg410396 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2022-01-12 13:56
On 11.01.2022 21:30, Erlend E. Aasland wrote: > >> I'd suggest to deprecate the cursor.lastrowid attribute and >> instead point people to the much more useful [...] > > Yes, I think mentioning the RETURNING ROWID trick in the sqlite3 docs is a very nice improvement. Mentioning the last_insert_rowid SQL function is probably also worth consideration. > > I'm reluctant to deprecate cursor.lastrowid, though. ATM, I'm leaning towards just keeping the current behaviour. Fair enough :-) Perhaps just documenting that the value is not necessarily what people may expect, when coming from other databases due to the different semantics with SQLite, is enough. -- Marc-Andre Lemburg eGenix.com
msg411242 - (view)	Author: Dong-hee Na (corona10) *	Date: 2022-01-22 09:40
New changeset 38afeb1a336f0451c0db86df567ef726f49f6438 by Erlend Egeberg Aasland in branch 'main': bpo-46249: Move set lastrowid out of the sqlite3 query loop (GH-30489) https://github.com/python/cpython/commit/38afeb1a336f0451c0db86df567ef726f49f6438

History
Date	User	Action	Args
2022-04-11 14:59:54	admin	set	github: 90407
2022-01-22 09:47:38	corona10	set	status: open -> closed stage: patch review -> resolved
2022-01-22 09:40:29	corona10	set	nosy: + corona10 messages: + msg411242
2022-01-12 13:56:10	lemburg	set	messages: + msg410396
2022-01-11 20:30:03	erlendaasland	set	messages: + msg410327
2022-01-11 20:17:21	lemburg	set	messages: + msg410326
2022-01-11 19:51:42	erlendaasland	set	messages: + msg410324
2022-01-11 19:46:51	erlendaasland	set	messages: + msg410323
2022-01-11 15:04:07	lemburg	set	messages: + msg410305 title: [sqlite3] lastrowid improvements -> [sqlite3] move set lastrowid out of the query loop and enable it for executemany()
2022-01-09 19:14:39	felixxm	set	nosy: + felixxm
2022-01-08 21:31:36	erlendaasland	set	title: [sqlite3] move set lastrowid out of the query loop and enable it for executemany() -> [sqlite3] lastrowid improvements
2022-01-08 21:13:25	erlendaasland	set	pull_requests: + pull_request28692
2022-01-08 21:12:12	erlendaasland	set	messages: + msg410117
2022-01-08 20:56:53	erlendaasland	set	messages: + msg410116
2022-01-04 21:27:23	erlendaasland	set	messages: + msg409712
2022-01-04 20:28:27	erlendaasland	set	title: [sqlite3] move set lastrowid out of the query loop -> [sqlite3] move set lastrowid out of the query loop and enable it for executemany()
2022-01-04 20:27:53	erlendaasland	set	messages: + msg409708
2022-01-04 20:23:02	lemburg	set	messages: + msg409707
2022-01-04 20:02:26	erlendaasland	set	messages: + msg409706
2022-01-04 10:45:50	lemburg	set	messages: + msg409668
2022-01-04 10:02:55	erlendaasland	set	messages: + msg409661
2022-01-04 09:48:54	lemburg	set	messages: + msg409660
2022-01-04 00:08:17	erlendaasland	set	messages: + msg409643
2022-01-04 00:01:27	erlendaasland	set	pull_requests: + pull_request28590
2022-01-03 23:49:11	erlendaasland	set	nosy: + lemburg messages: + msg409642
2022-01-03 21:15:49	erlendaasland	set	keywords: + patch stage: patch review pull_requests: + pull_request28583
2022-01-03 21:14:27	erlendaasland	create