Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3 module breaks transactions and potentially corrupts data #54949

Closed
scotturban mannequin opened this issue Dec 20, 2010 · 69 comments
Closed

sqlite3 module breaks transactions and potentially corrupts data #54949

scotturban mannequin opened this issue Dec 20, 2010 · 69 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@scotturban
Copy link
Mannequin

scotturban mannequin commented Dec 20, 2010

BPO 10740
Nosy @pitrou, @ned-deily, @bitdancer, @asvetlov, @florentx, @dholth, @Bluehorn, @RonnyPfannschmidt, @berkerpeksag, @JimJJewett, @zooba, @animalize, @jan-hudec, @carlwgeorge, @maggyero
Files
  • pysql-transactions.2.diff: proposed patch
  • test_sqlite_ddl.py: test script
  • sqlite_transaction_config_py27.diff: Patch against Python 2.7
  • sqlite_transaction_config_py3.diff: Patch for Python 3 default branch
  • sqlite_transaction_config_py27.diff: Patch against Python 2.7, take 2
  • sqlite_transaction_config_v2.diff: Updated patch: Added docs
  • transaction-api-proposal-issues-8145-9924-10740-16958.patch: API improvement proposal (as doc patch)
  • sqlite_sanity_check.rst
  • issue10740-aaugustin.patch
  • test_sqlite.py
  • issue10740_upstream.diff
  • issue10740_upstream_v2.diff
  • cant_vacuum.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-11.11:56:25.732>
    created_at = <Date 2010-12-20.18:02:11.675>
    labels = ['type-feature', 'library']
    title = 'sqlite3 module breaks transactions and potentially corrupts data'
    updated_at = <Date 2020-01-26.19:27:00.405>
    user = 'https://bugs.python.org/scotturban'

    bugs.python.org fields:

    activity = <Date 2020-01-26.19:27:00.405>
    actor = 'maggyero'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-11.11:56:25.732>
    closer = 'berker.peksag'
    components = ['Library (Lib)']
    creation = <Date 2010-12-20.18:02:11.675>
    creator = 'scott.urban'
    dependencies = []
    files = ['20118', '20119', '21417', '21418', '21419', '21479', '35498', '35780', '40252', '42251', '42316', '44437', '45949']
    hgrepos = []
    issue_num = 10740
    keywords = ['patch']
    message_count = 69.0
    messages = ['124392', '124393', '124398', '124407', '128815', '132285', '132287', '132414', '132470', '132612', '136362', '144197', '207789', '207796', '207800', '209900', '209901', '209904', '209917', '209924', '216715', '216732', '216734', '216738', '216740', '219757', '219758', '219759', '219761', '219806', '219818', '219884', '220478', '220491', '220500', '220501', '220519', '220533', '220555', '220566', '220602', '220615', '221568', '248816', '248830', '248833', '248839', '248915', '248917', '249111', '262241', '262254', '262268', '262280', '262281', '262573', '274810', '274816', '274935', '275760', '283528', '283529', '283533', '283919', '283921', '360665', '360703', '360729', '360733']
    nosy_count = 26.0
    nosy_names = ['ghaering', 'pitrou', 'ned.deily', 'Jeremy Banks', 'r.david.murray', 'zzzeek', 'asvetlov', 'flox', 'adamtj', 'dholth', 'torsten', 'monsanto', 'scott.urban', 'aymeric.augustin', 'tshepang', 'python-dev', 'Ronny.Pfannschmidt', 'berker.peksag', 'Mark.Bucciarelli', 'Jim.Jewett', 'steve.dower', 'malin', 'bulb', 'bkircher', 'carlwgeorge', 'maggyero']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue10740'
    versions = ['Python 3.6']

    @scotturban
    Copy link
    Mannequin Author

    scotturban mannequin commented Dec 20, 2010

    The python sqlite module automatically commits open transactions
    when it encounters a DDL statement. This is unnecessary; DDL is
    transactional in my testing (see attached).

    Attached patch addresses the issue. Patch is against 2.6.1, but
    looking at Trunk in svn, it seems like the patch is needed and
    would apply. One issue I could foresee is that this behavior might
    depend on the sqlite version in use (I'm on 3.6.10).

    Patch also allows pragma statement.

    @scotturban scotturban mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Dec 20, 2010
    @scotturban
    Copy link
    Mannequin Author

    scotturban mannequin commented Dec 20, 2010

    Here are some tests.

    @bitdancer
    Copy link
    Member

    See also bpo-8145. It would be nice if someone could sort all this out, but I'm not knowledgeable enough to do so.

    For this patch, it would be a significant change it behaviour. Therefore it would have to be a new feature controlled by a flag of some sort (which is part of the reason for the reference to bpo-8145).

    @bitdancer bitdancer added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Dec 20, 2010
    @scotturban
    Copy link
    Mannequin Author

    scotturban mannequin commented Dec 20, 2010

    I find the way that the sqlite3 module handles transactions pretty
    surprising in general, but I agree that someone who got used
    to DDL not rolling back could in theory find this patch surprising.

    We will apply this patch to our python build because having DDL
    covered by a transactions is convenient for certain tasks. If anyone
    can think of a problem with this simple patch, I'd like to hear it.

    Thanks
    Scott

    @dholth
    Copy link
    Mannequin

    dholth mannequin commented Feb 18, 2011

    I want transactional DDL too. I was tremendously surprised that I could not duplicate the way sqlite3 behaves on the command line from witin pysqlite.

    Instead of this patch, I would rather be able to instruct pysqlite to always begin a transaction for any kind of statement (I hear this is a requirement for DB-API compliance) and never perform an implicit commit for any reason. For example, someone on the google code project had a complaint that 'SAVEPOINT' (create a subtransaction) automatically commits because pysqlite doesn't know about it.

    I tried to trick pysqlite into doing what I wanted by prepending /* update */ to every CREATE TABLE statement but it didn't seem to quite work. Instead, I would prefer a pysqlite that does not strcmp(statement) at all.

    @Bluehorn
    Copy link
    Mannequin

    Bluehorn mannequin commented Mar 27, 2011

    The attached patch is my take on this issue. I ran into the problem that during schema upgrades dropping a table was not rolled back. In another instance, renaming a table was not rolled back. This greatly increases the risk of data loss for our application.

    Because I do not currently foresee which commands might need dropping out of a transaction and because of backwards compatibility woes, I added a new field to the sqlite3.Connection: operation_needs_transaction_callback

    This function is called to decide if a running transaction should be implicitly committed (I'd consider this dangerous), if a transaction has to be started if not running (should normally always hold) or if the transaction state should be left alone.

    For example, the "pragma foreign_keys = ..." is a no-op inside a transaction, therefore an implicit begin would be possibly harmful. In our application, we enable foreign keys when getting the connection out of the SQLAlchemy pool via a pool listener, which would be disabled if there is an implicit begin triggered.

    The patch also adds a bunch of unit tests to cover the new behaviour.

    @Bluehorn
    Copy link
    Mannequin

    Bluehorn mannequin commented Mar 27, 2011

    Same patch for Python 3. In fact, this also adds a missing Py_XDECREF.

    @dholth
    Copy link
    Mannequin

    dholth mannequin commented Mar 28, 2011

    Torsten basically you are suggesting that PRAGMA would never work at all with my 'do not strcmp() the sql at all, always begin a transaction' approach?

    @Bluehorn
    Copy link
    Mannequin

    Bluehorn mannequin commented Mar 29, 2011

    Torsten basically you are suggesting that PRAGMA would never work at all with my 'do not strcmp() the sql at all, always begin a transaction' approach?

    No. Most pragmas should still work and getting the current setting of a pragma should also work.

    I was only referring to http://www.sqlite.org/pragma.html#pragma_foreign_keys

    Quote:

    This pragma is a no-op within a transaction; foreign key constraint enforcement may only be enabled or disabled when there is no pending BEGIN or SAVEPOINT.

    I did not see such a restriction for other pragmas though I admit that I did not reread the list in full.

    @Bluehorn
    Copy link
    Mannequin

    Bluehorn mannequin commented Mar 30, 2011

    The attached patch is an updated version which adds a bit of documentation.

    @Bluehorn
    Copy link
    Mannequin

    Bluehorn mannequin commented May 20, 2011

    I updated the patch for upstream pysqlite2. Available at

    http://code.google.com/p/pysqlite/issues/detail?id=24

    Patch over there is for Python 2 (tested with our production Python 2.6).

    @MarkBucciarelli
    Copy link
    Mannequin

    MarkBucciarelli mannequin commented Sep 17, 2011

    Opened http://bugs.python.org/issue12997 in case there is a way to solve the foreign_key PRAGMA issue with a less disruptive fix.

    @adamtj
    Copy link
    Mannequin

    adamtj mannequin commented Jan 9, 2014

    The proposed patches don't fix the problem. They may well allow DDL in transactions, but that's not the real problem.

    A database library must *NEVER* implicitly commit or rollback. That's completely insane.

        >>> import this
        ...
        Explicit is better than implicit.

    If a statement can't be executed in a transaction, then trying to execute such a statement in a transaction is an error. Period. An exception *MUST* be raised. It is wrong to guess that the user really wanted to commit first.

        >>> import this
        ...
        Errors should never pass silently.
        ...
        In the face of ambiguity, refuse the temptation to guess.

    If such an exception is raised, you'll run into the problem exactly once before figuring out that you need to commit or rollback first. You can then change your code to safely and explicitly account for that limitation.

    This issue is not an enhancement, it's a bug. One might argue that fixing it will break existing code, but it won't. Any affected code is already broken. It's depending on a promised level of safety, and this bug is breaking that promise. It's better to fail noisily and consistently than to usually work, but sometimes silently corrupt data.

    As is, it's pretty much guaranteed that there is existing code that does DDL expecting to be in a transaction when it isn't. Even a well-tested schema upgrade can fail in production if the data is slightly different that in a testing database. Unique constraints can fail, etc. I frequently use the psycopg2 module and psql command, which get this issue right. They've saved me several times.

    The current behavior is buggy and should be changed to be correct. There should not merely be an option to enable the correct behavior. There should also be no option to enable the old buggy behavior. It's too dangerous.

    In general, it's a code-smell to inspect the SQL that's being executed before sending it to the database. The only reason I can think of to inspect SQL is to work around brokenness in the underlying database. Suppose, for example, that SQLite implicitly committed when it got DDL. (I don't know what it actually does. Just suppose.) In that case, it would be necessary to check for DDL and raise an exception before passing the statement to the database. If the database correctly errors in such a situation, then the python wrapper should just pass the DDL to the database and deal with the resulting error. That's way easier that trying to correctly detect what the statement type is. Parsing SQL correctly is hard.

    As evidence of that, note that the existing statement detection code is broken: it doesn't strip comments first! A simple SELECT statement preceeded by a comment will implicitly commit! But commenting is good.

        >>> import this
        ...
        Readability counts.

    So it's not even just an issue of DDL or PRAGMAs!

    Anything that causes the underlying database to implicitly commit must be avoided (yet the python module goes out of it's way to add *MORE* such buggy behavior!) Fixing brokenness in the underlying database is the only reason to inspect SQL as it passes through this module. If there are other ways to avoid such problems, they will likely be better than inspecting SQL.

    Anything which causes implicit rollbacks in the underlying database is a similar issue, but easier to handle without parsing SQL. It's not safe to implicitly rollback, even if a new transaction is begun. For example, you might be doing foreign key validation outside the database. If one INSERT were implicitly rolled back and a new transaction begun, a second INSERT may then succeed and be committed. Now you have a constraint violation, even though you correctly checked it before.

    If there is anything that triggers an implicit rollback in the underlying database connection, those rollbacks must be dectected all subsequent statements or actions on the python wrapper *should* raise exceptions until an explicit rollback is performed, except for commit() which *must* raise.

    For example:

        con.isolation_level = 'DEFERRED'
        try:
            # Nothing in here can implicitly commit or rollback safely.  
            # Doing so risks data corruption.
    
            cur.execute("INSERT INTO foo ...")
        try:
            con.somehow_trigger_implicit_rollback() # must raise
        except:
            # Throw away the exception.
            pass
    
        try:
            # This *should* raise too, but isn't strictly necessary,
            # as long as commit() is implemented correctly.
            cur.execute("INSERT INTO bar ...")
        except:
            pass
        
        # The underlying database rolled back our transaction.
        # A successful commit here would not match reality.
        # This commit() must raise.
        con.commit()
    except:
        # This rollback() can succeed, because it puts the 
        # "con" object back into correspondance with the underlying
        # database connection state.
        con.rollback()
        logger.error('something bad happened')
        raise
    

    If nothing in the database does implicit COMMITs or ROLLBACKs, then it's probably sufficient to leave all the error handling to the database and only raise exceptions when it indicates problems.

    Here is an example demonstrating that a SELECT can trigger an implicit commit.

        Python 2.7.3 (default, Sep 26 2013, 20:03:06)
        [GCC 4.6.3] on linux2
        Type "help", "copyright", "credits" or "license" for more information.
        >>>
        >>> # Create a connection.
        ... import sqlite3
        >>> con = sqlite3.connect(":memory:", isolation_level='DEFERRED')
        >>> cur = con.cursor()
        >>>
        >>> # Create some state.
        ... cur.execute("CREATE TABLE foo (i int);")
        <sqlite3.Cursor object at 0x7fc1b7d51500>
        >>> cur.execute("INSERT INTO foo VALUES (0);")
        <sqlite3.Cursor object at 0x7fc1b7d51500>
        >>> con.commit()
        >>>
        >>> # Change the state, do a SELECT, then rollback.
        ... cur.execute("UPDATE foo SET i=1")
        <sqlite3.Cursor object at 0x7fc1b7d51500>
        >>> cur.execute("SELECT 1;")
        <sqlite3.Cursor object at 0x7fc1b7d51500>
        >>> con.rollback()
        >>>
        >>> # Verify that the rollback worked.
        ... cur.execute("SELECT i FROM foo")
        <sqlite3.Cursor object at 0x7fc1b7d51500>
        >>> i = cur.fetchone()[0]
        >>> print 'i is', i
        i is 0
        >>> assert i == 0
        >>>
        >>> # Change some state, do a different SELECT, then attempt to rollback.
        ... cur.execute("UPDATE foo SET i=2")
        <sqlite3.Cursor object at 0x7fc1b7d51500>
        >>> # This will incorrectly commit:
        ... cur.execute("""
        ...     /*
        ...      * Comments are good for readability, but because of this bug, they can
        ...      * cause incorrect implicit commits.
        ...      */
        ...     SELECT 1;
        ... """)
        <sqlite3.Cursor object at 0x7fc1b7d51500>
        >>> con.rollback()
        >>>
        >>>
        >>> # The rollback succeeded and rolling back a different transaction than
        >>> # the one we expected.
        ... cur.execute("SELECT i FROM foo")
        <sqlite3.Cursor object at 0x7fc1b7d51500>
        >>> i = cur.fetchone()[0]
        >>> print 'i is', i
        i is 2
        >>> assert i == 0
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        AssertionError

    @adamtj adamtj mannequin added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Jan 9, 2014
    @adamtj adamtj mannequin changed the title sqlite3 module should allow DDL statements in transactions sqlite3 module breaks transactions and potentially corrupts data Jan 9, 2014
    @zzzeek
    Copy link
    Mannequin

    zzzeek mannequin commented Jan 9, 2014

    well Adam, you might also be surprised to see pysqlite not doing very well on the *other* side of the equation either; that is, when it *begins* the transaction. Issue http://bugs.python.org/issue9924 refers to this and like this one, hasn't seen any activity since 2011. You might be interested in the rationale on that one, which is that python apps may wish to have some degree of concurrency against a sqlite file for an application that is doing all SELECTs.

    I am certainly in favor of a pure pep-249 approach that emits BEGIN on the first execute() call period, and never implicitly rolls back or commits.

    However, I disagree this should be enabled by default nor that there should not be any option for the old behavior. I also think the "delayed BEGIN" feature should still be available to counteract SQLite's particular difficulty with concurrency.

    If there is code that relies upon a bug in order to function, which would then fail to function in that way if the bug is fixed, then by definition fixing that bug is a backwards-incompatible change. Python std lib can't afford to roll out a change that blatantly backwards-incompatible. The issue regarding comments not being parsed unfortunately should also be a separate issue that needs to be fixed, as nasty as it is that pysqlite tries to parse SQL.

    I disagree that most users are expecting SQLite's DDL to be transactional; transactional DDL is a luxury feature to many, including the vast numbers of developers that use MySQL, not to mention Oracle, neither of which have any transactional DDL capabilities.

    @bitdancer
    Copy link
    Member

    On Thu, 09 Jan 2014 20:29:15 +0000, Adam Tomjack <report@bugs.python.org> wrote:

    This issue is not an enhancement, it's a bug. One might argue that
    fixing it will break existing code, but it won't. Any affected code
    is already broken. It's depending on a promised level of safety, and
    this bug is breaking that promise. It's better to fail noisily and
    consistently than to usually work, but sometimes silently corrupt
    data.

    That's not how our backward compatibility policy works. If a program
    runs correctly, a change in a maintenance release should not cause
    that program to fail, *even if the change is a bug fix*.

    Now, that said, silent data corruption can change that calculus, but
    in this case there are many many working programs out there that
    don't experience data corruption, so breaking them in a maintenance
    release is just not acceptable.

    It is possible that there are specific things that can be fixed without
    breaking backward compatibility, but they will have to be argued individually.

    In general, it's a code-smell to inspect the SQL that's being executed
    before sending it to the database. The only reason I can think of to

    There are reasons why the sqlite module was designed the way it was,
    having to do with an intersection between how sqlite works and the DB
    API 2 specification. I'm not saying the design (and implementation of
    that design) has no bugs, but there *are* reasons why the choices made
    were made, including the concurrency issue mentioned by Mike (which if I
    understand correctly was even more problematic in earlier versions of
    sqlite.)

    inspect SQL is to work around brokenness in the underlying database.
    Suppose, for example, that SQLite implicitly committed when it got
    DDL. (I don't know what it actually does. Just suppose.) In that

    By default, SQLite starts a transaction when any command that changes
    the database is executed (the docs say, "basically, anything other than
    a SELECT), and "automatically started transactions are committed when
    the last query finishes". I'm not sure exactly what that means.
    Note that there is no way in sqlite to turn auto-transaction behavior
    off.

    case, it would be necessary to check for DDL and raise an exception
    before passing the statement to the database. If the database

    Like I said, this would need to be a new feature, due to our backward
    compatibility policy.

    correctly errors in such a situation, then the python wrapper should
    just pass the DDL to the database and deal with the resulting error.
    That's way easier that trying to correctly detect what the statement
    type is. Parsing SQL correctly is hard.

    If you want full control of transactions while using the sqlite module,
    you can set the isolation_level to None (sqlite's default autocommit
    mode) and then carefully control when you issue what kinds of statements,
    including BEGIN statements. Any exceptions produced by the database
    will in that situation be propagated to the application. That loses
    the (admittedly somewhat limited) DB-interoperability benefits of using
    the DB API 2, though.

    As evidence of that, note that the existing statement detection code
    is broken: it doesn't strip comments first! A simple SELECT statement
    preceeded by a comment will implicitly commit! But commenting is
    good.

    Yes, the fact that statement detection (and what statements are
    detected) is buggy is the essence of this issue.

    But note that in Python code it would be much more natural to
    put the comment in the python code, not the sql string.

    Anything that causes the underlying database to implicitly commit must
    be avoided (yet the python module goes out of it's way to add *MORE*
    such buggy behavior!) Fixing brokenness in the underlying database is
    the only reason to inspect SQL as it passes through this module. If

    Yep.

    there are other ways to avoid such problems, they will likely be
    better than inspecting SQL.

    Do you have a better way to suggest?

    Anything which causes implicit rollbacks in the underlying database is
    a similar issue, but easier to handle without parsing SQL. It's not

    There are no implicit rollbacks at issue anywhere here, that I can see.
    Do you have an example where the module does an implicit rollback?

    Here is an example demonstrating that a SELECT can trigger an implicit
    commit.

    Yes, the comment issue should be fixed. That's a different bug, though,
    and should be addressed in a separate issue. Unfortunately, since
    some people seem to be *depending* on the fact that putting in a comment
    disables statement detection, that, too will need to be a new feature.

    I'm sorry about that, but Python has a strong commitment to backward
    compatibility.

    @bitdancer bitdancer added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Jan 9, 2014
    @RonnyPfannschmidt
    Copy link
    Mannequin

    RonnyPfannschmidt mannequin commented Feb 1, 2014

    could we please get the option to opt-out of that behaviour, as a extra connection option maybe

    with the normal python sqlite bindings its impossible
    to have database migrations work safely
    which IMHO makes this a potential data-loss issue

    @bitdancer
    Copy link
    Member

    Opt out of what behavior?

    @RonnyPfannschmidt
    Copy link
    Mannequin

    RonnyPfannschmidt mannequin commented Feb 1, 2014

    the sqlite binding deciding how to handle transactions

    @bitdancer
    Copy link
    Member

    As noted above you get that by setting isolation_level to None. That feature has always been available. (With isolation_level set to None, the sqlite wrapper module itself never issues any BEGIN statements.)

    @RonnyPfannschmidt
    Copy link
    Mannequin

    RonnyPfannschmidt mannequin commented Feb 1, 2014

    http://hg.python.org/releasing/3.4/file/447c758cdc26/Modules/_sqlite/cursor.c#l789

    also i dont see the isolation level being taking into account in other parts of the code

    @monsanto
    Copy link
    Mannequin

    monsanto mannequin commented Apr 17, 2014

    This issue has been open for 4 years, last update was 2 months ago.

    Lack of transactional DDL is a big deal for Python programs that use SQLite heavily.

    We have a patch for Python 3 that applies cleanly and as far as I can tell works fine. I've been using it in testing for months, and I'm about to deploy it in production.

    What's the next step here? What do we need to do to get something merged?

    @ghaering ghaering mannequin self-assigned this Jan 11, 2015
    @aymericaugustin
    Copy link
    Mannequin

    aymericaugustin mannequin commented Aug 19, 2015

    This bug appears to be fixed upstream: ghaering/pysqlite@f254c53

    @bitdancer
    Copy link
    Member

    It's not a backward compatible change, so we'll need a migration strategy if we want to apply this (and I'd certainly like to).

    @ghaering
    Copy link
    Mannequin

    ghaering mannequin commented Aug 19, 2015

    Please note that after the mentioned commit, I restored backwards compatibility with commit ghaering/pysqlite@796b3af

    Now the only difference is that the implicit commits *before* DDL statements are gone. I consider these a bug anyway.

    @bitdancer
    Copy link
    Member

    Excellent.

    @jan-hudec
    Copy link
    Mannequin

    jan-hudec mannequin commented Aug 20, 2015

    While I agree that the current behaviour is a bug (this bug), and one that renders the package unusable for me (I used apsw or different language instead), I unfortunately have to disagree with Gerhard that the change is not a problem. It can be.

    The implicit commit before DDL statements that is gone is not a problem, but the implicit commit *after* them that is *also* gone is. Because if somebody only does DDL statements, and upgrade script might do just that, they will now need a commit.

    Scripts that use dbapi2 in a portable way will have the commit, because other databases that support DDL in transactions require it already. But scripts that are only used with current sqlite and possibly mysql (behaviour of which the current version mimics) may not and will break.

    @jan-hudec
    Copy link
    Mannequin

    jan-hudec mannequin commented Aug 20, 2015

    Oh, I see I misunderstood Gerhard's last commit.

    So now the problem should be only if there is a DML statement followed by DDL statement and no commit afterwards. Well, that is indeed probably stupid.

    @aymericaugustin
    Copy link
    Mannequin

    aymericaugustin mannequin commented Aug 25, 2015

    Since a better solution seems to be around the corner, I'm withdrawing my proposal.

    I'm attaching the current state of my patch. It isn't functional. Mostly it proves that my API proposal is very inconvenient to implement in C. That's why I kept it around for over a year without completing it.

    @rhunter
    Copy link
    Mannequin

    rhunter mannequin commented Mar 23, 2016

    Implicit transactions can also cause mysterious "database is locked" bugs, even when a large timeout is set on the connection. Many, many people are affected by this bug (search the web for "python sqlite database is locked").

    The attached code demonstrates this bug and possible (unintuitive) fixes.

    If there won't be a fix soon, then at least the documentation should note this quirky behavior.

    @ghaering
    Copy link
    Mannequin

    ghaering mannequin commented Mar 23, 2016

    Isn't this issue at least partly about the statement parsing code in the sqlite module?

    I've fixed this upstream a while ago in ghaering/pysqlite@94eae50

    Could somebody perhaps bring this to the Python repository.

    Here's a documentation update that's also required then:
    ghaering/pysqlite@cae87ee

    @zzzeek
    Copy link
    Mannequin

    zzzeek mannequin commented Mar 23, 2016

    @rian - implicit transactions are part of the DBAPI spec. Looking at the original description, the purpose of this bug is to not *end* the transaction when DDL is received. So there's no solution for "database is locked" here, other than pysqlite's usual behavior of not emitting BEGIN until DML is encountered (which works well).

    @rhunter
    Copy link
    Mannequin

    rhunter mannequin commented Mar 23, 2016

    @mike Yes you're right, This bug report is different, my mistake.

    DB-API may require implicit transactions but pysqlite should open a transaction before *any* non-DDL statement, including "SELECT", not just DML statements. Currently one must issue a dummy DML statement just to open a transaction that otherwise would start with a SELECT statement.

    I see nothing in DB-API (https://www.python.org/dev/peps/pep-0249/) that says transactions should implicitly open before DML statements and not SELECT statements.

    @zzzeek
    Copy link
    Mannequin

    zzzeek mannequin commented Mar 23, 2016

    @rian - your issue for this is http://bugs.python.org/issue9924.

    The implicit BEGIN in all cases will probably never be the default but we do need an option for this to be the case, in order to support SERIALIZABLE isolation.

    @berkerpeksag
    Copy link
    Member

    Here is a patch. It contains the following commits:

    Changes:

    • Fixed a refcount error
    • Fixed a compiler warning
    • Added a whatsnew entry

    Note: sqlite3_stmt_readonly() has been added in SQLite 3.7.4 (released on 2010-12-07) so this patch will make older SQLite versions unusable with the sqlite3 module.

    @berkerpeksag
    Copy link
    Member

    Here is an updated patch. I'd like to get this in before 3.6 beta 1.

    @aymericaugustin
    Copy link
    Mannequin

    aymericaugustin mannequin commented Sep 7, 2016

    The latest patch removes the current statement parsing and unexpected implicit commits. It looks good to me.

    Unfortunately it also introduces a new kind of statement parsing that detects DDL statements and doesn't open a transaction in that case, while it should. See ghaering/pysqlite#105 for details.

    As a consequence, this change will allow Django to remove one of the two special cases for transaction in SQLite:
    https://github.com/django/django/blob/master/django/db/transaction.py#L159-L166

    but not the other:
    https://github.com/django/django/blob/271581df606b307d89c141e8b1a50ace763bea81/django/db/backends/base/base.py#L375-L404

    It's still a step forward so I support merging it.

    @zooba
    Copy link
    Member

    zooba commented Sep 8, 2016

    I'm in favor of merging Berker's patch, once the string comparison is made a little more robust. Right now there's a risk of matching a prefix rather than a whole word, and potentially overrunning the buffer (unless I've forgotten how stricmp works).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 11, 2016

    New changeset 284676cf2ac8 by Berker Peksag in branch 'default':
    Issue bpo-10740: sqlite3 no longer implicitly commit an open transaction before DDL statements
    https://hg.python.org/cpython/rev/284676cf2ac8

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Dec 18, 2016

    I'm using Python 3.6.0 RC2, I don't know if it's a bug.

    When I try to run VACUUM command, an exception raised:

    conn.execute('begin')  # <- remove this line get the same result
    conn.execute('VACUUM')

    sqlite3.OperationalError: cannot VACUUM from within a transaction

    ~~~~~~~~~~~~~~~~~~~
    On Python 3.5, everything is fine.
    Attached file cant_vacuum.py is the full example code.

    @bitdancer
    Copy link
    Member

    It is an unexpected change in behavior and is therefore probably a bug. It will work if you set isolation_level=None. Because of that I don't think it is a release critical bug, though one could wish we'd discovered it earlier. Please open a new issue for this.

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Dec 18, 2016

    bpo-29003 created

    @carlwgeorge
    Copy link
    Mannequin

    carlwgeorge mannequin commented Dec 24, 2016

    While attempting to build a Python 3.6 RPM for RHEL/CentOS 6, I noticed the following warning.

    *** WARNING: renaming "_sqlite3" since importing it failed: build/lib.linux-i686-3.6-pydebug/_sqlite3.cpython-36dm-i386-linux-gnu.so: undefined symbol: sqlite3_stmt_readonly

    The resolution of this issue introduced usage of the sqlite3_stmt_readonly interface. That interface wasn't added to sqlite until 3.7.4 (http://www.sqlite.org/releaselog/3_7_4.html). My RPM build failed because RHEL/CentOS 6 only has sqlite 3.6.20. I understand that Python can't support old libraries forever, but can this minimum sqlite version be noted somewhere in the documentation?

    @bitdancer
    Copy link
    Member

    Please open a new issue for this request.

    @maggyero
    Copy link
    Mannequin

    maggyero mannequin commented Jan 25, 2020

    With Berker Peksag's patch merged in 2016, in the default non auto-commit mode, sqlite3 implicitly issues a BEGIN statement before any non SELECT statement not already in a transaction, EXCEPT DDL statements, for backwards compatibility reasons:

    + /* For backwards compatibility reasons, do not start a transaction if a
    + DDL statement is encountered. If anybody wants transactional DDL,
    + they can issue a BEGIN statement manually. */
    + if (self->connection->begin_statement && !sqlite3_stmt_readonly(self->statement->st) && !self->statement->is_ddl) {
    + if (sqlite3_get_autocommit(self->connection->db)) {
    + result = _pysqlite_connection_begin(self->connection);
    + if (!result) {
    + goto error;
    + }
    + Py_DECREF(result);
    }
    }

    Is there any plan to cover DDL statements as well in a future Python version and deprecate the old behaviour? That would avoid having to insert BEGIN statements manually for getting transactional DDL statements, like in psycopg2 for PostgreSQL databases.

    @bitdancer
    Copy link
    Member

    Please open a new issue for this question.

    @maggyero
    Copy link
    Mannequin

    maggyero mannequin commented Jan 26, 2020

    @Aymeric Augustin

    While you're there, it would be cool to provide "connection.autocommit = True" as an API to enable autocommit, because "connection.isolation_level = None" isn't a good API at all -- it's very obscure and has nothing to do with isolation level whatsoever.

    +1. We could use this new autocommit property to enable the new full transactional mode (that is to say with transactional DDL):

    connection.autocommit = True  # enable the autocommit mode
    connection.autocommit = False  # disable the autocommit mode (enable the full transactional mode)
    connection.autocommit = None  # fallback to connection.isolation_level
    

    To transition from the old partial transactional mode (without transactional DDL) by default to the new full transactional mode (with transactional DDL) by default, we could use the following migration strategy:

    1. During the deprecation period:
    • Add the new autocommit property with the value None by default, so that the old partial transactional mode is still the default.
    • Add a deprecation warning for the value None of the autocommit property, in favor of the other values True and False. It will prompt users who enabled the autocommit mode with connection.isolation_level = None to use connection.autocommit = True instead, and users who disabled the autocommit mode (that is to say users who enabled the old partial transactional mode) with connection.isolation_level = DEFERRED/IMMEDIATE/EXCLUSIVE to use connection.autocommit = False instead AND add to their code the potentially missing connection.commit() calls required by the new full transactional mode for committing DDL statements.
    1. After the deprecation period:
    • Set the value of the autocommit property to False by default, so that the new full transactional mode becomes the new default.
    • Remove the value None of the autocommit property and its deprecation warning.
    • Remove the value None of the isolation_level property, so that the old partial transactional mode disappears.

    @maggyero
    Copy link
    Mannequin

    maggyero mannequin commented Jan 26, 2020

    @r. David Murray

    Please open a new issue for this request.

    Done here: https://bugs.python.org/issue39457

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants