This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Allow incremental I/O to blobs in sqlite3
Type: enhancement Stage: patch review
Components: Extension Modules Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, eamanu, erlendaasland, ghaering, jim_minter, palaviv, rmast, serhiy.storchaka, waldhol
Priority: normal Keywords: patch

Created on 2015-08-21 08:05 by jim_minter, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
blob.patch palaviv, 2016-02-03 21:36 review
blob2.patch palaviv, 2016-08-18 18:17 review
blob3.patch palaviv, 2017-02-05 22:20 review
blob4.patch palaviv, 2017-02-06 18:16 review
Pull Requests
URL Status Linked Edit
PR 271 open palaviv, 2017-02-24 10:37
PR 30356 open erlendaasland, 2022-01-03 12:39
PR 30680 open erlendaasland, 2022-01-19 00:42
Messages (28)
msg248945 - (view) Author: Jim Minter (jim_minter) Date: 2015-08-21 08:05
SQLite supports incremental I/O to blobs, i.e. the capability to stream reads and writes to blobs without having to load the entire blob into memory first.  See https://www.sqlite.org/c3ref/blob_open.html for more details on the C API.

It'd be nice if it were possible to do this in Python using sqlite3 (it is already possible with apsw).
msg259262 - (view) Author: Aviv Palivoda (palaviv) * Date: 2016-01-30 17:25
I opened a pull request for blob support in the pysqlite github repository:
https://github.com/ghaering/pysqlite/pull/93

I will do the needed changes for python3 and will post a patch soon.
msg259520 - (view) Author: Aviv Palivoda (palaviv) * Date: 2016-02-03 21:36
I did the needed changes for the pull request at pysqlite for porting to python3. I will continue updating the patch if there will be changes in pysqlite and vice versa.
msg272873 - (view) Author: Aviv Palivoda (palaviv) * Date: 2016-08-16 18:47
Pinging as mentioned in the devguide.
msg272883 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-16 20:24
Added comments on Rietveld. The patch contains typos and violates PEP 7 and PEP 12. And there are questions about API.
msg273052 - (view) Author: Aviv Palivoda (palaviv) * Date: 2016-08-18 18:17
Thanks for the review Serhiy. Attached is the updated patch after the changes.
msg286957 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-04 11:31
Pinging again. I think this would be a great enhancement to the sqlite module.
msg287040 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-05 08:00
Sorry Aviv, I just forget about this issue.

Added new comments on Rietveld. Many lines in sqlite3.rst still are too long.

It would be worth to ask other developers about wanted interface.
msg287065 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-05 22:20
Thanks for the CR Serhiy. Attached is a new patch after the fixes from the CR.

What other developers should I ask? The interface is file like and is the same as apsw.
msg287081 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-06 06:36
apsw has different order of arguments for blobopen(), all arguments are mandatory, and the parameter for read/write mode is called "writeable" rather than "readonly". This part of API needs to be discussed. Ask for discussion on mailing lists: Python-Ideas and maybe Python-List.
msg287162 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-06 18:16
Uploading patch after fixes from berker CR.

The `blob_open` API can can have the following options:

1. The table, column and row must be mandatory parameters.
2. The read/write permissions can have the following options:
   a. No default (mandatory parameter).
   b. default read-only
   c. default write-only
3. The dbname can be without a default of "main" and then it will be a mandatory parameter.

I don't think that there is enough differences between the possible API's to justify sending a message to the mailing lists.
msg288515 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-24 10:40
I opened a PR in github. I tagged some other developers if that will not start a discussion on the API I will post in the python-ideas.
msg288517 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-24 10:48
Is there similar API in other databases?
msg288519 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-24 11:14
I am not a DB expert but from a quick search I couldn't find a similar API. I did find a few API's in other programming languages to sqlite blob:
1. https://jgallagher.github.io/rusqlite/rusqlite/blob/index.html
2. https://godoc.org/github.com/mxk/go-sqlite/sqlite3
3. http://www.ch-werner.de/javasqlite/SQLite/Blob.html

It seems like most of them give the same API as apsw.
msg288724 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-28 20:29
Small discussion is started at pull request [1]. There are doubts about the usefulness of incremental I/O API. SQLite is rarely used for storing blobs of the size of hundreds of megabytes. For smaller blobs it may be enough to read or write all data at once. There are also questions about the support of len(), since other file-like objects don't support len().

This discussion remembered me about mmap objects. mmap objects implement two protocols: file protocol and sequence protocol (including the support of len()). The BLOB object looks similar to the mmap object. sqlite3_blob_write() may only modify the contents of the BLOB; it is not possible to increase the size of a BLOB using this API. Maybe implement the sequence protocol in the BLOB object? Or implement only the sequence protocol and drop away the file protocol?

[1] https://github.com/python/cpython/pull/271
msg288725 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-28 20:49
Just to make sure when you say "sequence protocol" you thin about the doing blob[4:6] = "ab"?

I actually think this is a nice feature. The Blob and the mmap object has a lot in common so I think that making the same API will be best. I think that adding the sequence protocol in addition to the file protocol is best.
msg297205 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-06-28 18:28
Pinging. As I mentioned in the PR I need a little help with __contains__.
msg315450 - (view) Author: Aviv Palivoda (palaviv) * Date: 2018-04-18 13:24
As I wrote in the PR:
I think that the contains operation should not be supported for blobs. As blobs can be very large, looking for a subset of bytes inside them will be a very inefficient process in memory or in compute.

I believe that this is a very good feature for the sqlite module. I know that there is no active core developer that is currently working on sqlite module but this patch is already waiting 2 years. Could I do anything to help this patch merged?
msg329457 - (view) Author: Holger Waldmann (waldhol) Date: 2018-11-08 06:54
I am looking forward for this to be included.

My main use case is on restricted IoT devices, where I need to handle BLOBs of 2MB size.
As this is larger than my available RAM, I store need to store them in the filesystem today.

This is ugly because it is not part of the atomic commit of the database.

I would very much appreciate if I could get a file-like API for BLOBs in sqlite.
msg372225 - (view) Author: Emmanuel Arias (eamanu) * Date: 2020-06-24 02:03
Hi everybody, I let a comment on github.

I was asking about if are there some plans with this PR?
Was open 3 years ago and there are not updating 


Cheers
msg399813 - (view) Author: Ryan Mast (nightlark) (rmast) * Date: 2021-08-18 01:01
It looks like this has a PR that just needs rebasing, then it will be ready for another review.
msg409494 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-02 11:30
FYI, I'm maintaining a rebased version on my fork:

https://github.com/erlend-aasland/cpython/tree/sqlite-blob

Here's the changes I've applied upon PR 271:

- Use Argument Clinic
- Convert to heap types
- Adopt new CPython C APIs (Py_NewRef, Py_Is*, new slice API, etc.)
- Lint code (PEP 7, use CPython idioms, normalise variable and helper namings)
- Lint docs
- Normalise error messages
- Refactor very large functions, apply simplifications, and remove unneeded code
- Expand test suite to increase code coverage

I don't think it is ready for inclusion yet; the test suite needs to be restructured, or maybe rewritten from scratch. Pr. now most of the tests maintain a single connection with a prepared database. IMO, it's a fragile design and it is hard to read, debug, and verify the test code. I'll see if I can get to it in a couple of weeks.
msg409580 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-03 12:44
I've submitted my changes as a separate PR.

I believe we should consider duplicating the apsw API, for users convenience. Pr. now, they are very similar, except for the "open blob" API: apsw uses `blobopen`, we currently use `open_blob`. I suggest changing it to `blobopen`.
msg409584 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-03 14:13
The diff is pretty heavy, here's the diffstat:
 17 files changed, 1362 insertions(+), 7 deletions(-)

It will be hard to find reviewers for such a large PR. I suggest to remove mapping support and context manager support for now, and then add those features in separate PR's.
msg409591 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-03 15:25
Slimmed PR diff (excluding clinic), without context manager and mapping protocol:

$  git diff main Modules/_sqlite/*.[ch] Lib Doc Misc PC* setup.py
 Doc/includes/sqlite3/blob.py                                      |  12 +
 Doc/includes/sqlite3/blob_with.py                                 |  12 +
 Doc/library/sqlite3.rst                                           |  73 ++++++
 Doc/whatsnew/3.11.rst                                             |   4 +
 Lib/test/test_sqlite3/test_dbapi.py                               | 165 +++++++++++++-
 bpo-24905.jYqjYx.rst">Misc/NEWS.d/next/Library/2018-04-18-16-15-55.bpo-24905.jYqjYx.rst |   3 +
 Modules/_sqlite/blob.c                                            | 342 +++++++++++++++++++++++++++++
 Modules/_sqlite/blob.h                                            |  24 ++
 Modules/_sqlite/connection.c                                      |  83 ++++++-
 Modules/_sqlite/connection.h                                      |   5 +-
 Modules/_sqlite/module.c                                          |   6 +-
 Modules/_sqlite/module.h                                          |   1 +
 PCbuild/_sqlite3.vcxproj                                          |   2 +
 PCbuild/_sqlite3.vcxproj.filters                                  |   6 +
 setup.py                                                          |   1 +
 15 files changed, 733 insertions(+), 6 deletions(-)


Looks promising.
msg409592 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-03 15:26
(the bpo bot is obviously confused by the News entry path, sigh)
msg410920 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-19 00:45
PR without mapping protocol and context manager submitted: GH-30680. I've also changed the open API to reflect that of apsw.
msg411021 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2022-01-20 12:42
If you are ok with the proposed API in GH-30680, I'll mark it as ready for review.
History
Date User Action Args
2022-04-11 14:58:20adminsetgithub: 69093
2022-01-20 12:42:10erlendaaslandsetmessages: + msg411021
2022-01-19 00:45:57erlendaaslandsetmessages: + msg410920
2022-01-19 00:42:44erlendaaslandsetpull_requests: + pull_request28878
2022-01-03 15:26:23erlendaaslandsetmessages: + msg409592
2022-01-03 15:25:25erlendaaslandsetmessages: + msg409591
2022-01-03 15:24:52erlendaaslandsetmessages: - msg409590
2022-01-03 15:23:18erlendaaslandsetmessages: + msg409590
2022-01-03 14:13:51erlendaaslandsetmessages: + msg409584
2022-01-03 12:44:22erlendaaslandsetmessages: + msg409580
2022-01-03 12:39:29erlendaaslandsetpull_requests: + pull_request28570
2022-01-02 11:30:40erlendaaslandsetmessages: + msg409494
2021-08-18 01:01:04rmastsetnosy: + rmast
messages: + msg399813
2021-02-22 19:32:21erlendaaslandsetnosy: + erlendaasland
2020-06-24 02:03:38eamanusetnosy: + eamanu
messages: + msg372225
2018-11-08 07:07:24serhiy.storchakasetversions: + Python 3.8, - Python 3.7
2018-11-08 06:54:29waldholsetnosy: + waldhol
messages: + msg329457
2018-04-18 13:24:11palavivsetmessages: + msg315450
2017-06-28 18:28:29palavivsetmessages: + msg297205
2017-02-28 20:49:01palavivsetmessages: + msg288725
2017-02-28 20:29:33serhiy.storchakasetmessages: + msg288724
2017-02-24 11:14:51palavivsetmessages: + msg288519
2017-02-24 10:48:18serhiy.storchakasetmessages: + msg288517
2017-02-24 10:40:29palavivsetmessages: + msg288515
2017-02-24 10:37:59palavivsetpull_requests: + pull_request244
2017-02-06 18:16:35palavivsetfiles: + blob4.patch

messages: + msg287162
2017-02-06 06:36:07serhiy.storchakasetmessages: + msg287081
2017-02-06 01:48:43berker.peksagsetnosy: + berker.peksag
2017-02-05 22:20:19palavivsetfiles: + blob3.patch

messages: + msg287065
2017-02-05 08:00:25serhiy.storchakasetstage: patch review
messages: + msg287040
versions: + Python 3.7, - Python 3.6
2017-02-04 11:31:27palavivsetmessages: + msg286957
2016-08-18 18:17:58palavivsetfiles: + blob2.patch

messages: + msg273052
2016-08-16 20:24:38serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg272883
2016-08-16 18:47:28palavivsetmessages: + msg272873
2016-02-03 21:36:10palavivsetfiles: + blob.patch
keywords: + patch
messages: + msg259520
2016-01-30 17:25:23palavivsetmessages: + msg259262
2016-01-26 22:50:28palavivsetnosy: + palaviv
2016-01-16 08:28:31SilentGhostsetnosy: + ghaering
components: + Extension Modules
2015-08-21 08:05:21jim_mintercreate