classification
Title: Allow incremental I/O to blobs in sqlite3
Type: enhancement Stage: patch review
Components: Extension Modules Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, ghaering, jim_minter, palaviv, serhiy.storchaka, waldhol
Priority: normal Keywords: patch

Created on 2015-08-21 08:05 by jim_minter, last changed 2018-11-08 07:07 by serhiy.storchaka.

Files
File name Uploaded Description Edit
blob.patch palaviv, 2016-02-03 21:36 review
blob2.patch palaviv, 2016-08-18 18:17 review
blob3.patch palaviv, 2017-02-05 22:20 review
blob4.patch palaviv, 2017-02-06 18:16 review
Pull Requests
URL Status Linked Edit
PR 271 open palaviv, 2017-02-24 10:37
Messages (19)
msg248945 - (view) Author: Jim Minter (jim_minter) Date: 2015-08-21 08:05
SQLite supports incremental I/O to blobs, i.e. the capability to stream reads and writes to blobs without having to load the entire blob into memory first.  See https://www.sqlite.org/c3ref/blob_open.html for more details on the C API.

It'd be nice if it were possible to do this in Python using sqlite3 (it is already possible with apsw).
msg259262 - (view) Author: Aviv Palivoda (palaviv) * Date: 2016-01-30 17:25
I opened a pull request for blob support in the pysqlite github repository:
https://github.com/ghaering/pysqlite/pull/93

I will do the needed changes for python3 and will post a patch soon.
msg259520 - (view) Author: Aviv Palivoda (palaviv) * Date: 2016-02-03 21:36
I did the needed changes for the pull request at pysqlite for porting to python3. I will continue updating the patch if there will be changes in pysqlite and vice versa.
msg272873 - (view) Author: Aviv Palivoda (palaviv) * Date: 2016-08-16 18:47
Pinging as mentioned in the devguide.
msg272883 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-16 20:24
Added comments on Rietveld. The patch contains typos and violates PEP 7 and PEP 12. And there are questions about API.
msg273052 - (view) Author: Aviv Palivoda (palaviv) * Date: 2016-08-18 18:17
Thanks for the review Serhiy. Attached is the updated patch after the changes.
msg286957 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-04 11:31
Pinging again. I think this would be a great enhancement to the sqlite module.
msg287040 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-05 08:00
Sorry Aviv, I just forget about this issue.

Added new comments on Rietveld. Many lines in sqlite3.rst still are too long.

It would be worth to ask other developers about wanted interface.
msg287065 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-05 22:20
Thanks for the CR Serhiy. Attached is a new patch after the fixes from the CR.

What other developers should I ask? The interface is file like and is the same as apsw.
msg287081 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-06 06:36
apsw has different order of arguments for blobopen(), all arguments are mandatory, and the parameter for read/write mode is called "writeable" rather than "readonly". This part of API needs to be discussed. Ask for discussion on mailing lists: Python-Ideas and maybe Python-List.
msg287162 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-06 18:16
Uploading patch after fixes from berker CR.

The `blob_open` API can can have the following options:

1. The table, column and row must be mandatory parameters.
2. The read/write permissions can have the following options:
   a. No default (mandatory parameter).
   b. default read-only
   c. default write-only
3. The dbname can be without a default of "main" and then it will be a mandatory parameter.

I don't think that there is enough differences between the possible API's to justify sending a message to the mailing lists.
msg288515 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-24 10:40
I opened a PR in github. I tagged some other developers if that will not start a discussion on the API I will post in the python-ideas.
msg288517 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-24 10:48
Is there similar API in other databases?
msg288519 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-24 11:14
I am not a DB expert but from a quick search I couldn't find a similar API. I did find a few API's in other programming languages to sqlite blob:
1. https://jgallagher.github.io/rusqlite/rusqlite/blob/index.html
2. https://godoc.org/github.com/mxk/go-sqlite/sqlite3
3. http://www.ch-werner.de/javasqlite/SQLite/Blob.html

It seems like most of them give the same API as apsw.
msg288724 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-28 20:29
Small discussion is started at pull request [1]. There are doubts about the usefulness of incremental I/O API. SQLite is rarely used for storing blobs of the size of hundreds of megabytes. For smaller blobs it may be enough to read or write all data at once. There are also questions about the support of len(), since other file-like objects don't support len().

This discussion remembered me about mmap objects. mmap objects implement two protocols: file protocol and sequence protocol (including the support of len()). The BLOB object looks similar to the mmap object. sqlite3_blob_write() may only modify the contents of the BLOB; it is not possible to increase the size of a BLOB using this API. Maybe implement the sequence protocol in the BLOB object? Or implement only the sequence protocol and drop away the file protocol?

[1] https://github.com/python/cpython/pull/271
msg288725 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-02-28 20:49
Just to make sure when you say "sequence protocol" you thin about the doing blob[4:6] = "ab"?

I actually think this is a nice feature. The Blob and the mmap object has a lot in common so I think that making the same API will be best. I think that adding the sequence protocol in addition to the file protocol is best.
msg297205 - (view) Author: Aviv Palivoda (palaviv) * Date: 2017-06-28 18:28
Pinging. As I mentioned in the PR I need a little help with __contains__.
msg315450 - (view) Author: Aviv Palivoda (palaviv) * Date: 2018-04-18 13:24
As I wrote in the PR:
I think that the contains operation should not be supported for blobs. As blobs can be very large, looking for a subset of bytes inside them will be a very inefficient process in memory or in compute.

I believe that this is a very good feature for the sqlite module. I know that there is no active core developer that is currently working on sqlite module but this patch is already waiting 2 years. Could I do anything to help this patch merged?
msg329457 - (view) Author: Holger Waldmann (waldhol) Date: 2018-11-08 06:54
I am looking forward for this to be included.

My main use case is on restricted IoT devices, where I need to handle BLOBs of 2MB size.
As this is larger than my available RAM, I store need to store them in the filesystem today.

This is ugly because it is not part of the atomic commit of the database.

I would very much appreciate if I could get a file-like API for BLOBs in sqlite.
History
Date User Action Args
2018-11-08 07:07:24serhiy.storchakasetversions: + Python 3.8, - Python 3.7
2018-11-08 06:54:29waldholsetnosy: + waldhol
messages: + msg329457
2018-04-18 13:24:11palavivsetmessages: + msg315450
2017-06-28 18:28:29palavivsetmessages: + msg297205
2017-02-28 20:49:01palavivsetmessages: + msg288725
2017-02-28 20:29:33serhiy.storchakasetmessages: + msg288724
2017-02-24 11:14:51palavivsetmessages: + msg288519
2017-02-24 10:48:18serhiy.storchakasetmessages: + msg288517
2017-02-24 10:40:29palavivsetmessages: + msg288515
2017-02-24 10:37:59palavivsetpull_requests: + pull_request244
2017-02-06 18:16:35palavivsetfiles: + blob4.patch

messages: + msg287162
2017-02-06 06:36:07serhiy.storchakasetmessages: + msg287081
2017-02-06 01:48:43berker.peksagsetnosy: + berker.peksag
2017-02-05 22:20:19palavivsetfiles: + blob3.patch

messages: + msg287065
2017-02-05 08:00:25serhiy.storchakasetstage: patch review
messages: + msg287040
versions: + Python 3.7, - Python 3.6
2017-02-04 11:31:27palavivsetmessages: + msg286957
2016-08-18 18:17:58palavivsetfiles: + blob2.patch

messages: + msg273052
2016-08-16 20:24:38serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg272883
2016-08-16 18:47:28palavivsetmessages: + msg272873
2016-02-03 21:36:10palavivsetfiles: + blob.patch
keywords: + patch
messages: + msg259520
2016-01-30 17:25:23palavivsetmessages: + msg259262
2016-01-26 22:50:28palavivsetnosy: + palaviv
2016-01-16 08:28:31SilentGhostsetnosy: + ghaering
components: + Extension Modules
2015-08-21 08:05:21jim_mintercreate