Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow incremental I/O to blobs in sqlite3 #69093

Closed
jimminter mannequin opened this issue Aug 21, 2015 · 32 comments
Closed

Allow incremental I/O to blobs in sqlite3 #69093

jimminter mannequin opened this issue Aug 21, 2015 · 32 comments
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@jimminter
Copy link
Mannequin

jimminter mannequin commented Aug 21, 2015

BPO 24905
Nosy @berkerpeksag, @serhiy-storchaka, @palaviv, @eamanu, @erlend-aasland, @nightlark
PRs
  • bpo-24905: Support BLOB incremental I/O in sqlite module #271
  • gh-69093: Add incremental I/O to blobs support in sqlite3 #30356
  • gh-69093: Support basic incremental I/O to blobs in sqlite3 #30680
  • Files
  • blob.patch
  • blob2.patch
  • blob3.patch
  • blob4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-08-21.08:05:21.911>
    labels = ['extension-modules', 'type-feature', '3.8']
    title = 'Allow incremental I/O to blobs in sqlite3'
    updated_at = <Date 2022-01-20.12:42:10.486>
    user = 'https://bugs.python.org/jimminter'

    bugs.python.org fields:

    activity = <Date 2022-01-20.12:42:10.486>
    actor = 'erlendaasland'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2015-08-21.08:05:21.911>
    creator = 'jim_minter'
    dependencies = []
    files = ['41798', '44142', '46531', '46549']
    hgrepos = []
    issue_num = 24905
    keywords = ['patch']
    message_count = 28.0
    messages = ['248945', '259262', '259520', '272873', '272883', '273052', '286957', '287040', '287065', '287081', '287162', '288515', '288517', '288519', '288724', '288725', '297205', '315450', '329457', '372225', '399813', '409494', '409580', '409584', '409591', '409592', '410920', '411021']
    nosy_count = 9.0
    nosy_names = ['ghaering', 'berker.peksag', 'serhiy.storchaka', 'jim_minter', 'waldhol', 'palaviv', 'eamanu', 'erlendaasland', 'rmast']
    pr_nums = ['271', '30356', '30680']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue24905'
    versions = ['Python 3.8']

    @jimminter
    Copy link
    Mannequin Author

    jimminter mannequin commented Aug 21, 2015

    SQLite supports incremental I/O to blobs, i.e. the capability to stream reads and writes to blobs without having to load the entire blob into memory first. See https://www.sqlite.org/c3ref/blob_open.html for more details on the C API.

    It'd be nice if it were possible to do this in Python using sqlite3 (it is already possible with apsw).

    @jimminter jimminter mannequin added the type-feature A feature request or enhancement label Aug 21, 2015
    @SilentGhost SilentGhost mannequin added the extension-modules C modules in the Modules dir label Jan 16, 2016
    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Jan 30, 2016

    I opened a pull request for blob support in the pysqlite github repository:
    ghaering/pysqlite#93

    I will do the needed changes for python3 and will post a patch soon.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Feb 3, 2016

    I did the needed changes for the pull request at pysqlite for porting to python3. I will continue updating the patch if there will be changes in pysqlite and vice versa.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Aug 16, 2016

    Pinging as mentioned in the devguide.

    @serhiy-storchaka
    Copy link
    Member

    Added comments on Rietveld. The patch contains typos and violates PEP-7 and PEP-12. And there are questions about API.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Aug 18, 2016

    Thanks for the review Serhiy. Attached is the updated patch after the changes.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Feb 4, 2017

    Pinging again. I think this would be a great enhancement to the sqlite module.

    @serhiy-storchaka
    Copy link
    Member

    Sorry Aviv, I just forget about this issue.

    Added new comments on Rietveld. Many lines in sqlite3.rst still are too long.

    It would be worth to ask other developers about wanted interface.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Feb 5, 2017
    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Feb 5, 2017

    Thanks for the CR Serhiy. Attached is a new patch after the fixes from the CR.

    What other developers should I ask? The interface is file like and is the same as apsw.

    @serhiy-storchaka
    Copy link
    Member

    apsw has different order of arguments for blobopen(), all arguments are mandatory, and the parameter for read/write mode is called "writeable" rather than "readonly". This part of API needs to be discussed. Ask for discussion on mailing lists: Python-Ideas and maybe Python-List.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Feb 6, 2017

    Uploading patch after fixes from berker CR.

    The blob_open API can can have the following options:

    1. The table, column and row must be mandatory parameters.
    2. The read/write permissions can have the following options:
      a. No default (mandatory parameter).
      b. default read-only
      c. default write-only
    3. The dbname can be without a default of "main" and then it will be a mandatory parameter.

    I don't think that there is enough differences between the possible API's to justify sending a message to the mailing lists.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Feb 24, 2017

    I opened a PR in github. I tagged some other developers if that will not start a discussion on the API I will post in the python-ideas.

    @serhiy-storchaka
    Copy link
    Member

    Is there similar API in other databases?

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Feb 24, 2017

    I am not a DB expert but from a quick search I couldn't find a similar API. I did find a few API's in other programming languages to sqlite blob:

    1. https://jgallagher.github.io/rusqlite/rusqlite/blob/index.html
    2. https://godoc.org/github.com/mxk/go-sqlite/sqlite3
    3. http://www.ch-werner.de/javasqlite/SQLite/Blob.html

    It seems like most of them give the same API as apsw.

    @serhiy-storchaka
    Copy link
    Member

    Small discussion is started at pull request [1]. There are doubts about the usefulness of incremental I/O API. SQLite is rarely used for storing blobs of the size of hundreds of megabytes. For smaller blobs it may be enough to read or write all data at once. There are also questions about the support of len(), since other file-like objects don't support len().

    This discussion remembered me about mmap objects. mmap objects implement two protocols: file protocol and sequence protocol (including the support of len()). The BLOB object looks similar to the mmap object. sqlite3_blob_write() may only modify the contents of the BLOB; it is not possible to increase the size of a BLOB using this API. Maybe implement the sequence protocol in the BLOB object? Or implement only the sequence protocol and drop away the file protocol?

    [1] #271

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Feb 28, 2017

    Just to make sure when you say "sequence protocol" you thin about the doing blob[4:6] = "ab"?

    I actually think this is a nice feature. The Blob and the mmap object has a lot in common so I think that making the same API will be best. I think that adding the sequence protocol in addition to the file protocol is best.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Jun 28, 2017

    Pinging. As I mentioned in the PR I need a little help with __contains__.

    @palaviv
    Copy link
    Mannequin

    palaviv mannequin commented Apr 18, 2018

    As I wrote in the PR:
    I think that the contains operation should not be supported for blobs. As blobs can be very large, looking for a subset of bytes inside them will be a very inefficient process in memory or in compute.

    I believe that this is a very good feature for the sqlite module. I know that there is no active core developer that is currently working on sqlite module but this patch is already waiting 2 years. Could I do anything to help this patch merged?

    @waldhol
    Copy link
    Mannequin

    waldhol mannequin commented Nov 8, 2018

    I am looking forward for this to be included.

    My main use case is on restricted IoT devices, where I need to handle BLOBs of 2MB size.
    As this is larger than my available RAM, I store need to store them in the filesystem today.

    This is ugly because it is not part of the atomic commit of the database.

    I would very much appreciate if I could get a file-like API for BLOBs in sqlite.

    @serhiy-storchaka serhiy-storchaka added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Nov 8, 2018
    @eamanu
    Copy link
    Mannequin

    eamanu mannequin commented Jun 24, 2020

    Hi everybody, I let a comment on github.

    I was asking about if are there some plans with this PR?
    Was open 3 years ago and there are not updating

    Cheers

    @nightlark
    Copy link
    Mannequin

    nightlark mannequin commented Aug 18, 2021

    It looks like this has a PR that just needs rebasing, then it will be ready for another review.

    @erlend-aasland
    Copy link
    Contributor

    FYI, I'm maintaining a rebased version on my fork:

    https://github.com/erlend-aasland/cpython/tree/sqlite-blob

    Here's the changes I've applied upon PR 271:

    • Use Argument Clinic
    • Convert to heap types
    • Adopt new CPython C APIs (Py_NewRef, Py_Is*, new slice API, etc.)
    • Lint code (PEP-7, use CPython idioms, normalise variable and helper namings)
    • Lint docs
    • Normalise error messages
    • Refactor very large functions, apply simplifications, and remove unneeded code
    • Expand test suite to increase code coverage

    I don't think it is ready for inclusion yet; the test suite needs to be restructured, or maybe rewritten from scratch. Pr. now most of the tests maintain a single connection with a prepared database. IMO, it's a fragile design and it is hard to read, debug, and verify the test code. I'll see if I can get to it in a couple of weeks.

    @erlend-aasland
    Copy link
    Contributor

    I've submitted my changes as a separate PR.

    I believe we should consider duplicating the apsw API, for users convenience. Pr. now, they are very similar, except for the "open blob" API: apsw uses blobopen, we currently use open_blob. I suggest changing it to blobopen.

    @erlend-aasland
    Copy link
    Contributor

    The diff is pretty heavy, here's the diffstat:
    17 files changed, 1362 insertions(+), 7 deletions(-)

    It will be hard to find reviewers for such a large PR. I suggest to remove mapping support and context manager support for now, and then add those features in separate PR's.

    @erlend-aasland
    Copy link
    Contributor

    Slimmed PR diff (excluding clinic), without context manager and mapping protocol:

    $  git diff main Modules/_sqlite/*.[ch] Lib Doc Misc PC* setup.py
     Doc/includes/sqlite3/blob.py                                      |  12 +
     Doc/includes/sqlite3/blob_with.py                                 |  12 +
     Doc/library/sqlite3.rst                                           |  73 ++++++
     Doc/whatsnew/3.11.rst                                             |   4 +
     Lib/test/test_sqlite3/test_dbapi.py                               | 165 +++++++++++++-
     Misc/NEWS.d/next/Library/2018-04-18-16-15-55.bpo-24905.jYqjYx.rst |   3 +
     Modules/_sqlite/blob.c                                            | 342 +++++++++++++++++++++++++++++
     Modules/_sqlite/blob.h                                            |  24 ++
     Modules/_sqlite/connection.c                                      |  83 ++++++-
     Modules/_sqlite/connection.h                                      |   5 +-
     Modules/_sqlite/module.c                                          |   6 +-
     Modules/_sqlite/module.h                                          |   1 +
     PCbuild/_sqlite3.vcxproj                                          |   2 +
     PCbuild/_sqlite3.vcxproj.filters                                  |   6 +
     setup.py                                                          |   1 +
     15 files changed, 733 insertions(+), 6 deletions(-)

    Looks promising.

    @JelleZijlstra
    Copy link
    Member

    @erlend-aasland can we close this issue with #30680 merged, or do you want to keep it for further API enhancements for blobs?

    JelleZijlstra added a commit to JelleZijlstra/cpython that referenced this issue Apr 15, 2022
    I noticed this was missing while writing typeshed stubs. It's
    useful to expose it for use in annotations.
    erlend-aasland pushed a commit to erlend-aasland/cpython that referenced this issue Apr 15, 2022
    - document that you cannot open a blob handle in a WITHOUT ROWID table
    - document the blobopen() positional arguments in the same order as they
      appear
    erlend-aasland pushed a commit to erlend-aasland/cpython that referenced this issue Apr 15, 2022
    @erlend-aasland
    Copy link
    Contributor

    @erlend-aasland can we close this issue with #30680 merged, or do you want to keep it for further API enhancements for blobs?

    Let's keep it open until we've got context manager and mapping support in place.

    JelleZijlstra added a commit that referenced this issue Apr 15, 2022
    I noticed this was missing while writing typeshed stubs. It's
    useful to expose it for use in annotations and for exploration.
    erlend-aasland pushed a commit to erlend-aasland/cpython that referenced this issue Apr 15, 2022
    erlend-aasland pushed a commit to erlend-aasland/cpython that referenced this issue Apr 15, 2022
    Unless sqlite3_blob_open() returns SQLITE_MISUSE, the error code and
    message are available on the connection object. This means we have to
    handle SQLITE_MISUSE error messages explicitly.
    JelleZijlstra pushed a commit that referenced this issue Apr 15, 2022
    )
    
    Unless sqlite3_blob_open() returns SQLITE_MISUSE, the error code and
    message are available on the connection object. This means we have to
    handle SQLITE_MISUSE error messages explicitly.
    erlend-aasland pushed a commit to erlend-aasland/cpython that referenced this issue Apr 16, 2022
    Authored-by: Aviv Palivoda <palaviv@gmail.com>
    Co-authored-by: Erlend E. Aasland <erlend.aasland@innova.no>
    JelleZijlstra pushed a commit that referenced this issue Apr 16, 2022
    - document that you cannot open a blob handle in a WITHOUT ROWID table
    - document the blobopen() positional arguments in the same order as they
      appear
    - relocate sqlite3.Blob section
    JelleZijlstra pushed a commit that referenced this issue Apr 22, 2022
    Authored-by: Aviv Palivoda <palaviv@gmail.com>
    Co-authored-by: Erlend E. Aasland <erlend.aasland@innova.no>
    @erlend-aasland
    Copy link
    Contributor

    Incremental I/O for blobs is now supported in the sqlite3 extension module in the upcoming Python 3.11 release. Thanks a lot to @palaviv, @JelleZijlstra, and everyone else who contributed by commenting and reviewing.

    IMO, the implementation can benefit from further tweaks, but let's do that in separate issues if needed.

    @erlend-aasland
    Copy link
    Contributor

    cc. @eamanu, @nightlark

    @erlend-aasland erlend-aasland removed the 3.8 only security fixes label Apr 22, 2022
    tiran added a commit to tiran/cpython that referenced this issue May 31, 2022
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 31, 2022
    (cherry picked from commit 8a5e3c2)
    
    Co-authored-by: Christian Heimes <christian@python.org>
    miss-islington added a commit that referenced this issue May 31, 2022
    (cherry picked from commit 8a5e3c2)
    
    Co-authored-by: Christian Heimes <christian@python.org>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants