classification
Title: sqlite3 module breaks transactions and potentially corrupts data
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jeremy Banks, Jim.Jewett, Mark.Bucciarelli, Ronny.Pfannschmidt, adamtj, asvetlov, aymeric.augustin, bkircher, bulb, dholth, flox, ghaering, monsanto, pitrou, r.david.murray, scott.urban, torsten, tshepang, zzzeek
Priority: normal Keywords: patch

Created on 2010-12-20 18:02 by scott.urban, last changed 2014-06-25 18:33 by torsten.

Files
File name Uploaded Description Edit
pysql-transactions.2.diff scott.urban, 2010-12-20 18:02 proposed patch
test_sqlite_ddl.py scott.urban, 2010-12-20 18:03 test script
sqlite_transaction_config_py27.diff torsten, 2011-03-27 01:34 Patch against Python 2.7
sqlite_transaction_config_py3.diff torsten, 2011-03-27 01:54 Patch for Python 3 default branch review
sqlite_transaction_config_py27.diff torsten, 2011-03-27 01:56 Patch against Python 2.7, take 2
sqlite_transaction_config_v2.diff torsten, 2011-03-30 20:32 Updated patch: Added docs review
transaction-api-proposal-issues-8145-9924-10740-16958.patch aymeric.augustin, 2014-06-06 14:10 API improvement proposal (as doc patch) review
sqlite_sanity_check.rst torsten, 2014-06-25 18:33
Messages (43)
msg124392 - (view) Author: Scott Urban (scott.urban) Date: 2010-12-20 18:02
The python sqlite module automatically commits open transactions
when it encounters a DDL statement.  This is unnecessary; DDL is
transactional in my testing (see attached).

Attached patch addresses the issue. Patch is against 2.6.1, but
looking at Trunk in svn, it seems like the patch is needed and
would apply. One issue I could foresee is that this behavior might
depend on the sqlite version in use (I'm on 3.6.10).

Patch also allows pragma statement.
msg124393 - (view) Author: Scott Urban (scott.urban) Date: 2010-12-20 18:03
Here are some tests.
msg124398 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-20 18:34
See also Issue 8145.  It would be nice if someone could sort all this out, but I'm not knowledgeable enough to do so.

For this patch, it would be a significant change it behaviour.  Therefore it would have to be a new feature controlled by a flag of some sort (which is part of the reason for the reference to issue 8145).
msg124407 - (view) Author: Scott Urban (scott.urban) Date: 2010-12-20 22:08
I find the way that the sqlite3 module handles transactions pretty
surprising in general, but I agree that someone who got used
to DDL not rolling back could in theory find this patch surprising.

We will apply this patch to our python build because having DDL
covered by a transactions is convenient for certain tasks. If anyone
can think of a problem with this simple patch, I'd like to hear it.

Thanks
Scott
msg128815 - (view) Author: Daniel Holth (dholth) (Python committer) Date: 2011-02-18 21:17
I want transactional DDL too. I was tremendously surprised that I could not duplicate the way sqlite3 behaves on the command line from witin pysqlite.

Instead of this patch, I would rather be able to instruct pysqlite to always begin a transaction for any kind of statement (I hear this is a requirement for DB-API compliance) and never perform an implicit commit for any reason. For example, someone on the google code project had a complaint that 'SAVEPOINT' (create a subtransaction) automatically commits because pysqlite doesn't know about it.

I tried to trick pysqlite into doing what I wanted by prepending /* update */ to every CREATE TABLE statement but it didn't seem to quite work. Instead, I would prefer a pysqlite that does not strcmp(statement) at all.
msg132285 - (view) Author: Torsten Landschoff (torsten) Date: 2011-03-27 01:34
The attached patch is my take on this issue. I ran into the problem that during schema upgrades dropping a table was not rolled back. In another instance, renaming a table was not rolled back. This greatly increases the risk of data loss for our application.

Because I do not currently foresee which commands might need dropping out of a transaction and because of backwards compatibility woes, I added a new field to the sqlite3.Connection: operation_needs_transaction_callback

This function is called to decide if a running transaction should be implicitly committed (I'd consider this dangerous), if a transaction has to be started if not running (should normally always hold) or if the transaction state should be left alone.

For example, the "pragma foreign_keys = ..." is a no-op inside a transaction, therefore an implicit begin would be possibly harmful. In our application, we enable foreign keys when getting the connection out of the SQLAlchemy pool via a pool listener, which would be disabled if there is an implicit begin triggered.

The patch also adds a bunch of unit tests to cover the new behaviour.
msg132287 - (view) Author: Torsten Landschoff (torsten) Date: 2011-03-27 01:54
Same patch for Python 3. In fact, this also adds a missing Py_XDECREF.
msg132414 - (view) Author: Daniel Holth (dholth) (Python committer) Date: 2011-03-28 19:49
Torsten basically you are suggesting that PRAGMA would never work at all with my 'do not strcmp() the sql at all, always begin a transaction' approach?
msg132470 - (view) Author: Torsten Landschoff (torsten) Date: 2011-03-29 07:39
> Torsten basically you are suggesting that PRAGMA would never work at all with my 'do not strcmp() the sql at all, always begin a transaction' approach?

No. Most pragmas should still work and getting the current setting of a pragma should also work.

I was only referring to http://www.sqlite.org/pragma.html#pragma_foreign_keys

Quote:

> This pragma is a no-op within a transaction; foreign key constraint enforcement may only be enabled or disabled when there is no pending BEGIN or SAVEPOINT.

I did not see such a restriction for other pragmas though I admit that I did not reread the list in full.
msg132612 - (view) Author: Torsten Landschoff (torsten) Date: 2011-03-30 20:32
The attached patch is an updated version which adds a bit of documentation.
msg136362 - (view) Author: Torsten Landschoff (torsten) Date: 2011-05-20 10:28
I updated the patch for upstream pysqlite2. Available at

http://code.google.com/p/pysqlite/issues/detail?id=24

Patch over there is for Python 2 (tested with our production Python 2.6).
msg144197 - (view) Author: Mark Bucciarelli (Mark.Bucciarelli) Date: 2011-09-17 15:39
Opened http://bugs.python.org/issue12997 in case there is a way to solve the foreign_key PRAGMA issue with a less disruptive fix.
msg207789 - (view) Author: Adam Tomjack (adamtj) Date: 2014-01-09 20:29
The proposed patches don't fix the problem.  They may well allow DDL in transactions, but that's not the real problem.

A database library must *NEVER* implicitly commit or rollback.  That's completely insane.

    >>> import this
    ...
    Explicit is better than implicit.

If a statement can't be executed in a transaction, then trying to execute such a statement in a transaction is an error.  Period.  An exception *MUST* be raised.  It is wrong to guess that the user really wanted to commit first.

    >>> import this
    ...
    Errors should never pass silently.
    ...
    In the face of ambiguity, refuse the temptation to guess.

If such an exception is raised, you'll run into the problem exactly once before figuring out that you need to commit or rollback first.  You can then change your code to safely and explicitly account for that limitation.

This issue is not an enhancement, it's a bug.  One might argue that fixing it will break existing code, but it won't.  Any affected code is already broken.  It's depending on a promised level of safety, and this bug is breaking that promise.  It's better to fail noisily and consistently than to usually work, but sometimes silently corrupt data.

As is, it's pretty much guaranteed that there is existing code that does DDL expecting to be in a transaction when it isn't.  Even a well-tested schema upgrade can fail in production if the data is slightly different that in a testing database.  Unique constraints can fail, etc.  I frequently use the psycopg2 module and psql command, which get this issue right.  They've saved me several times.

The current behavior is buggy and should be changed to be correct.  There should not merely be an option to enable the correct behavior.  There should also be no option to enable the old buggy behavior.  It's too dangerous.

In general, it's a code-smell to inspect the SQL that's being executed before sending it to the database.  The only reason I can think of to inspect SQL is to work around brokenness in the underlying database.  Suppose, for example, that SQLite implicitly committed when it got DDL.  (I don't know what it actually does.  Just suppose.)  In that case, it would be necessary to check for DDL and raise an exception before passing the statement to the database.  If the database correctly errors in such a situation, then the python wrapper should just pass the DDL to the database and deal with the resulting error.  That's way easier that trying to correctly detect what the statement type is.  Parsing SQL correctly is hard.

As evidence of that, note that the existing statement detection code is broken: it doesn't strip comments first!  A simple SELECT statement preceeded by a comment will implicitly commit!  But commenting is good.

    >>> import this
    ...
    Readability counts.

So it's not even just an issue of DDL or PRAGMAs!

Anything that causes the underlying database to implicitly commit must be avoided (yet the python module goes out of it's way to add *MORE* such buggy behavior!)  Fixing brokenness in the underlying database is the only reason to inspect SQL as it passes through this module.  If there are other ways to avoid such problems, they will likely be better than inspecting SQL.

Anything which causes implicit rollbacks in the underlying database is a similar issue, but easier to handle without parsing SQL.  It's not safe to implicitly rollback, even if a new transaction is begun.  For example, you might be doing foreign key validation outside the database.  If one INSERT were implicitly rolled back and a new transaction begun, a second INSERT may then succeed and be committed.  Now you have a constraint violation, even though you correctly checked it before.

If there is anything that triggers an implicit rollback in the underlying database connection, those rollbacks must be dectected all subsequent statements or actions on the python wrapper *should* raise exceptions until an explicit rollback is performed, except for commit() which *must* raise.

For example:

    con.isolation_level = 'DEFERRED'
    try:
        # Nothing in here can implicitly commit or rollback safely.  
        # Doing so risks data corruption.

        cur.execute("INSERT INTO foo ...")

        try:
            con.somehow_trigger_implicit_rollback() # must raise
        except:
            # Throw away the exception.
            pass

        try:
            # This *should* raise too, but isn't strictly necessary,
            # as long as commit() is implemented correctly.
            cur.execute("INSERT INTO bar ...")
        except:
            pass
        
        # The underlying database rolled back our transaction.
        # A successful commit here would not match reality.
        # This commit() must raise.
        con.commit()
    except:
        # This rollback() can succeed, because it puts the 
        # "con" object back into correspondance with the underlying
        # database connection state.
        con.rollback()
        logger.error('something bad happened')
        raise

If nothing in the database does implicit COMMITs or ROLLBACKs, then it's probably sufficient to leave all the error handling to the database and only raise exceptions when it indicates problems.

Here is an example demonstrating that a SELECT can trigger an implicit commit.

    Python 2.7.3 (default, Sep 26 2013, 20:03:06)
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
    >>> # Create a connection.
    ... import sqlite3
    >>> con = sqlite3.connect(":memory:", isolation_level='DEFERRED')
    >>> cur = con.cursor()
    >>>
    >>> # Create some state.
    ... cur.execute("CREATE TABLE foo (i int);")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> cur.execute("INSERT INTO foo VALUES (0);")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> con.commit()
    >>>
    >>> # Change the state, do a SELECT, then rollback.
    ... cur.execute("UPDATE foo SET i=1")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> cur.execute("SELECT 1;")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> con.rollback()
    >>>
    >>> # Verify that the rollback worked.
    ... cur.execute("SELECT i FROM foo")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> i = cur.fetchone()[0]
    >>> print 'i is', i
    i is 0
    >>> assert i == 0
    >>>
    >>> # Change some state, do a different SELECT, then attempt to rollback.
    ... cur.execute("UPDATE foo SET i=2")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> # This will incorrectly commit:
    ... cur.execute("""
    ...     /*
    ...      * Comments are good for readability, but because of this bug, they can
    ...      * cause incorrect implicit commits.
    ...      */
    ...     SELECT 1;
    ... """)
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> con.rollback()
    >>>
    >>>
    >>> # The rollback succeeded and rolling back a different transaction than
    >>> # the one we expected.
    ... cur.execute("SELECT i FROM foo")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> i = cur.fetchone()[0]
    >>> print 'i is', i
    i is 2
    >>> assert i == 0
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AssertionError
msg207796 - (view) Author: mike bayer (zzzeek) Date: 2014-01-09 21:20
well Adam, you might also be surprised to see pysqlite not doing very well on the *other* side of the equation either; that is, when it *begins* the transaction.  Issue http://bugs.python.org/issue9924 refers to this and like this one, hasn't seen any activity since 2011.  You might be interested in the rationale on that one, which is that python apps may wish to have some degree of concurrency against a sqlite file for an application that is doing all SELECTs.  

I am certainly in favor of a pure pep-249 approach that emits BEGIN on the first execute() call period, and never implicitly rolls back or commits.  

However, I disagree this should be enabled by default nor that there should not be any option for the old behavior.  I also think the "delayed BEGIN" feature should still be available to counteract SQLite's particular difficulty with concurrency.

If there is code that relies upon a bug in order to function, which would then fail to function in that way if the bug is fixed, then by definition fixing that bug is a backwards-incompatible change.   Python std lib can't afford to roll out a change that blatantly backwards-incompatible.   The issue regarding comments not being parsed unfortunately should also be a separate issue that needs to be fixed, as nasty as it is that pysqlite tries to parse SQL.

I disagree that most users are expecting SQLite's DDL to be transactional; transactional DDL is a luxury feature to many, including the vast numbers of developers that use MySQL, not to mention Oracle, neither of which have any transactional DDL capabilities.
msg207800 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-01-09 22:05
On Thu, 09 Jan 2014 20:29:15 +0000, Adam Tomjack <report@bugs.python.org> wrote:
> This issue is not an enhancement, it's a bug.  One might argue that
> fixing it will break existing code, but it won't.  Any affected code
> is already broken.  It's depending on a promised level of safety, and
> this bug is breaking that promise.  It's better to fail noisily and
> consistently than to usually work, but sometimes silently corrupt
> data.

That's not how our backward compatibility policy works.  If a program
runs correctly, a change in a maintenance release should not cause
that program to fail, *even if the change is a bug fix*.

Now, that said, silent data corruption can change that calculus, but
in this case there are many many working programs out there that
don't experience data corruption, so breaking them in a maintenance
release is just not acceptable.

It is possible that there are specific things that can be fixed without
breaking backward compatibility, but they will have to be argued individually.

> In general, it's a code-smell to inspect the SQL that's being executed
> before sending it to the database.  The only reason I can think of to

There are reasons why the sqlite module was designed the way it was,
having to do with an intersection between how sqlite works and the DB
API 2 specification.  I'm not saying the design (and implementation of
that design) has no bugs, but there *are* reasons why the choices made
were made, including the concurrency issue mentioned by Mike (which if I
understand correctly was even more problematic in earlier versions of
sqlite.)

> inspect SQL is to work around brokenness in the underlying database.
> Suppose, for example, that SQLite implicitly committed when it got
> DDL.  (I don't know what it actually does.  Just suppose.)  In that

By default, SQLite starts a transaction when any command that changes
the database is executed (the docs say, "basically, anything other than
a SELECT), and "automatically started transactions are committed when
the last query finishes".  I'm not sure exactly what that means.
Note that there is no way in sqlite to turn auto-transaction behavior 
off.

> case, it would be necessary to check for DDL and raise an exception
> before passing the statement to the database.  If the database

Like I said, this would need to be a new feature, due to our backward
compatibility policy.

> correctly errors in such a situation, then the python wrapper should
> just pass the DDL to the database and deal with the resulting error.
> That's way easier that trying to correctly detect what the statement
> type is.  Parsing SQL correctly is hard.

If you want full control of transactions while using the sqlite module,
you can set the isolation_level to None (sqlite's default autocommit
mode) and then carefully control when you issue what kinds of statements,
including BEGIN statements.  Any exceptions produced by the database
will in that situation be propagated to the application.  That loses
the (admittedly somewhat limited) DB-interoperability benefits of using
the DB API 2, though.

> As evidence of that, note that the existing statement detection code
> is broken: it doesn't strip comments first!  A simple SELECT statement
> preceeded by a comment will implicitly commit!  But commenting is
> good.

Yes, the fact that statement detection (and what statements are
detected) is buggy is the essence of this issue.

But note that in Python code it would be much more natural to
put the comment in the python code, not the sql string.

> Anything that causes the underlying database to implicitly commit must
> be avoided (yet the python module goes out of it's way to add *MORE*
> such buggy behavior!)  Fixing brokenness in the underlying database is
> the only reason to inspect SQL as it passes through this module.  If

Yep.

> there are other ways to avoid such problems, they will likely be
> better than inspecting SQL.

Do you have a better way to suggest?

> Anything which causes implicit rollbacks in the underlying database is
> a similar issue, but easier to handle without parsing SQL.  It's not

There are no implicit rollbacks at issue anywhere here, that I can see.
Do you have an example where the module does an implicit rollback?

> Here is an example demonstrating that a SELECT can trigger an implicit
> commit.

Yes, the comment issue should be fixed.  That's a different bug, though, 
and should be addressed in a separate issue.  Unfortunately, since
some people seem to be *depending* on the fact that putting in a comment
disables statement detection, that, too will need to be a new feature.

I'm sorry about that, but Python has a strong commitment to backward
compatibility.
msg209900 - (view) Author: Ronny Pfannschmidt (Ronny.Pfannschmidt) Date: 2014-02-01 17:00
could we please get the option to opt-out of that behaviour, as a extra connection option maybe

with the normal python sqlite bindings its impossible
to have database migrations work safely
which IMHO makes this a potential data-loss issue
msg209901 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-01 17:25
Opt out of what behavior?
msg209904 - (view) Author: Ronny Pfannschmidt (Ronny.Pfannschmidt) Date: 2014-02-01 18:20
the sqlite binding deciding how to handle transactions
msg209917 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-01 21:11
As noted above you get that by setting isolation_level to None.  That feature has always been available.  (With isolation_level set to None, the sqlite wrapper module itself never issues any BEGIN statements.)
msg209924 - (view) Author: Ronny Pfannschmidt (Ronny.Pfannschmidt) Date: 2014-02-01 22:47
http://hg.python.org/releasing/3.4/file/447c758cdc26/Modules/_sqlite/cursor.c#l789

also i dont see the isolation level being taking into account in other parts of the code
msg216715 - (view) Author: Chris Monsanto (monsanto) Date: 2014-04-17 17:29
This issue has been open for 4 years, last update was 2 months ago.

Lack of transactional DDL is a big deal for Python programs that use SQLite heavily.

We have a patch for Python 3 that applies cleanly and as far as I can tell works fine. I've been using it in testing for months, and I'm about to deploy it in production.

What's the next step here? What do we need to do to get something merged?
msg216732 - (view) Author: Aymeric Augustin (aymeric.augustin) * Date: 2014-04-17 19:15
Hey -- maintainer of Django's transaction support here.

This ticket was brought to my attention again today. As I know a few things about this issue and I see Python core devs asking for input, I'll give my $0.02.

The core of this issue is that, **by default**, the sqlite3 module performs a simplistic parsing of SQL queries and decides to commit before anything it doesn't understand, for example "SAVEPOINT foo".

I don't think it's a good idea to change sqlite's default operation mode to autocommit. PEP 249's prescription for transaction management aren't practical (and Django enforces the opposite behavior by default) but that's beyond the point. It's way too much backwards incompatible.

However, I don't think it's a good idea either to automatically send a COMMIT when I want to make a SAVEPOINT. In fact, if you want to use transactions with sqlite, connection.isolation_level = None is almost warranted -- and then you do everything manually.

For a slightly longer explanation, see https://www.youtube.com/watch?v=09tM18_st4I#t=1751

I've struggled mightily with all this when rewriting Django's transaction management. See:

- https://github.com/django/django/blob/3becac84/django/db/backends/sqlite3/base.py#L107
https://github.com/django/django/blob/3becac84/django/db/backends/sqlite3/base.py#L398-L403
- https://github.com/django/django/blob/3becac84/django/db/transaction.py#L185-L195

I have implemented the workarounds I need and I we won't be able to remove them from Django anytime soon. As a consequence, I have little interest in getting this fixed.

Still, it would be sane to stop committing before savepoints by default -- that kinda ruins the concept of savepoints :-) It would be even saner to stop parsing SQL. Does http://hg.python.org/cpython/file/85fd955c6fc8/Modules/_sqlite/cursor.c#l32 look like an adequate parser for http://sqlite.org/lang.html?

Unfortunately, I don't have backwards-compatible proposal to fix this. Trying to account for a bit more syntax will help in the short term but not fix the underlying issue.

PS: I find it unfair to brand SQLite as a neutered database engine. Its transaction model is vastly superior to, say, MySQL. Too bad sqlite3 ruins it.
msg216734 - (view) Author: Chris Monsanto (monsanto) Date: 2014-04-17 20:01
> Unfortunately, I don't have backwards-compatible proposal to fix this. Trying to account for a bit more syntax will help in the short term but not fix the underlying issue.

aaugustin -- the patch by torsen made 3 years ago is backwards compatible. It adds a hook that you can use to disable autocommits, but you have to opt-in. I think that is good enough for now. We can worry about how to transition people away from the broken transaction model in the future.

Let's treat this as any other backwards compatibility problem in Python. We have a flag, that when enabled, removes the shitty behavior. In a future release, you get a warning for relying on the shitty behavior. In a release after that, we kill the old behavior. Then we deprecate the flag. In other words, it literally doesn't matter what the flag is. torsen's is fine. We just need some way to enable transactional DDL.
msg216738 - (view) Author: Aymeric Augustin (aymeric.augustin) * Date: 2014-04-17 20:43
That patch solves the problem, at the cost of introducing an unwieldy API, "operation_needs_transaction_callback".

I'm very skeptical of the other API, "in_transaction". Other database backends usually provide an "autocommit" attribute.

"autocommit" is the opposite of "in_transaction" for all practical purposes. There's only two situations where they may be equal:

- before the first query
- after an explicit commit

Then you aren't in a transaction and you aren't in autocommit. But in these cases, in practice, the question you want to ask is "is the next query going to create a transaction?" (and if not, I may want to create one.)

So the semantic of "connection.autocommit" is much more useful than the semantic of "connection.in_transaction".

While you're there, it would be cool to provide "connection.autocommit = True" as an API to enable autocommit, because "connection.isolation_level = None" isn't a good API at all -- it's very obscure and has nothing to do with isolation level whatsoever.
msg216740 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-17 20:58
What is the callback thing for? The only value that's ever passed in the tests is `lambda operation: True`.
msg219757 - (view) Author: Jan Hudec (bulb) Date: 2014-06-04 14:12
Mike, David,

The bug is that sqlite module issues implicit COMMIT. SQLite never needs this, many programs need to NOT have it and they can't opt out as isolation_level affects implicit begins only.

Most programs will do some changes to data after changing the schema, so they will need to commit() at the end anyway and dropping the implicit commit before DDL statements won't break them. But there may be some rare cases that do just a create or delete table/view/index here or there that could be affected, so I am not against explicit request to stop generating implicit commits. Just put a big warning in the documentation that any new code should always do this.

I don't understand why the implicit commit was initially added. It is serious problem for schema migrations and I don't see anything it would help. The only thing I can think of is to behave like MySQL which does not support DDL in transactions. But anybody who ever tried to do a non-trivial schema update in MySQL has probably cursed it to hell many times.

On the other hand there is absolutely nothing broken on the implicit BEGIN (which is required by dbapi2 specification) nor on the isolation_level property which controls it. Those shouldn't be touched; there is no reason to.

However note, that skipping the implicit BEGIN before SELECT only helps concurrency a little. If BEGIN (DEFERRED) TRANSACTION is followed by SELECT, the database is locked in shared mode and the same happens in the implicit transaction when the SELECT starts without explicit begin. Other readers are still allowed. The only difference is that with explicit BEGIN the database remains locked, while without it it is unlocked when the SELECT finishes (read to end or cursor closed).

Oh, I briefly looked at the patch and I suspect it's doing too much. It would IMHO be best to really just make the implicit COMMIT conditional on backward compatibility flag.
msg219758 - (view) Author: Jan Hudec (bulb) Date: 2014-06-04 14:25
This is somewhat borderline between bug and enhancement. The behaviour is described in the documentation and does not violate dbapi2 specification, but at the same time it is a serious misfeature and gratuitous restriction of perfectly good functionality of the underlying library.

So I added all the versions back as I think this deserves fixing for 2.7, but left it as enhancement as it's not a bug per se, just a misfeature.
msg219759 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-04 14:40
And our policy is that enhancements can only go in the next release.  

We cannot change the default behavior in maintenance releases for backward compatibility reasons, and we cannot provide for a requested change in behavior without introducing an API change, which we will not do in a maintenance release.  Therefore whatever is done it can only be done in 3.5.
msg219761 - (view) Author: Aymeric Augustin (aymeric.augustin) * Date: 2014-06-04 15:43
> On the other hand there is absolutely nothing broken on the implicit BEGIN (which is required by dbapi2 specification) nor on the isolation_level property which controls it. Those shouldn't be touched; there is no reason to.

Nothing broken... unless one considers there should be a relation between the name of a parameter and the feature it controls ;-) `isolation_level` should really be called `transaction_mode`. It's a specific extension of the BEGIN TRANSACTION statement in SQLite and it's unrelated to the standard concept of transaction isolation levels.

SQLite almost always operates at the SERIALIZABLE isolation level. (For exceptions to this rule, search for PRAGMA read_uncommitted; in the documentation.) This is a consequence of its file-lock based implementation of transactional atomicity.
msg219806 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-06-05 12:17
For the record, I think it would be easier to get a patch accepted for this if it didn't add a rather mysterious callback-based API. Which kind of approach would work, though, I don't have any idea about :-)
msg219818 - (view) Author: Jan Hudec (bulb) Date: 2014-06-05 15:25
Ok, David, I see.

Anybody who wants to use sqlite seriously in existing releases can use apsw. It is not dbapi2 compliant, but it is complete and behaves like the underlying database.

I agree with Antoine and already mentioned I didn't like the current patch.

I think all that is needed is another property, say `autocommit_ddl`. For compatibility reasons it would default to `True` and when set to `False` the module would then begin transaction before any command, probably except `pragma` where it is sometimes problem and should not matter in other cases and `select` still subject to `isolation_level` and possibly fix of the related http://bugs.python.org/issue9924.

The second patch seems closer, but it still does not bypass the logic as I'd suggest.
msg219884 - (view) Author: Aymeric Augustin (aymeric.augustin) * Date: 2014-06-06 14:10
I'm attaching a documentation patch describing improvements of the transaction management APIs that would address this issue as well as three others.

It's preserving the current transaction handling by default for backwards compatibility. It's introducing two new, more simple and less surprising transaction modes.

Could someone have a look and tell me if such improvements seem reasonable? If so, I'll try to write a patch targeting Python 3.5.
msg220478 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2014-06-13 18:42
(1)  The patch is just to docs, not code, so I'm not entirely sure that I understood it properly.

(2)  The new semantics are a huge mess to explain.  This might be because the old semantics were bad, but the result is the same.  I think it would be better to create a different API object.  (If I read correctly, you would leave function connect unchanged but also def dbapi_connect.)  Then keep this new one as simple as it should be -- which will probably mean leaving out some of the parameters that you deprecate.

(3)  The existing module documentation does claim to be DB-API 2.0 compliant; you should be explicit in the documentation (including docstring) about which behavior differences affect each of (database correctness/API standards conformance/ease of use).
msg220491 - (view) Author: Aymeric Augustin (aymeric.augustin) * Date: 2014-06-13 20:07
Thanks for your feedback!

Indeed this is a documentation patch describing what I could implement if a core dev gave the slightest hint that it stands a small chance of getting included. There's no point in writing code that Python core doesn't want.

I designed this API to maximize the chances it would be accepted, based on what I know of Python's development. That's probably why it was merely ignored rather than rejected ;-) More seriously, since I'm interested in improving what the stdlib ships, I have to take backwards compatibility into account. If I just wanted to implement a different API, I'd do it outside of CPython.

Yes, I could add PEP 249 compliance considerations to the docs. I'll do it if my general approach is accepted.
msg220500 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2014-06-13 21:00
Removing the existing behavior will almost certainly not be accepted, because of backwards compatibility.

Adding new functionality is generally acceptable.

Doing that through a new keyword that defaults to the old behavior is fairly common, and generally better than an apparently redundant function.  But in this particular case, that makes the new signature and documentation such a mess that I believe a separate function would be the lesser of evils.  (And that only if it is clear what the problems with the current situation are.  e.g., do the problems only trigger when using the database from too many threads, or is a random network delay enough to trigger problems?)
msg220501 - (view) Author: Aymeric Augustin (aymeric.augustin) * Date: 2014-06-13 21:16
The problem is super simple:

connection = ...
cursor = connection.cursor()
cursor.execute("SAVEPOINT foo")
cursor.execute("ROLLBACK TO SAVEPOINT foo")  # will report that the savepoint doesn't exist.

Please don't make it look more complicated than it is.
msg220519 - (view) Author: Torsten Landschoff (torsten) Date: 2014-06-14 01:46
Hi all,

let me warn you first: I have put a lot a effort in writing the patch back then. I moved on since as I don't even know when the improvement will make any difference for me. As so often, trying to contribute to an open source project turned out to be a waste of time, for the following reasons:

1. the patch first has to made it past review before being committed

2. it will be released with some distant version (looking from 2011)

3. I couldn't just replace Python with a new version for our builds (even if it was released within a month)

In the end I had to work around the problem in client code. If anybody cares, the solution was to special case SQLite as a database backend and do transactions manually. As we now have to support Oracle as well, we don't have the luxury of transactions with DDL anyhow so basically we drop out of DBAPI2 transaction management and everything becomes easy again.


I should have replied earlier to the questions. Maybe by answering I stand a chance that I will ever use sqlite3 in Python 3 again.

Quoting Aymeric Augustin (aymeric.augustin):

> That patch solves the problem, at the cost of introducing an unwieldy API, "operation_needs_transaction_callback".

You are right, I had forgotten that each client would need to know about it. In fact, the value used in the tests etc. should probably be the default.

The idea was to not take away from what's there already: The sqlite3 module already has a feature to inspect a command and begin or commit automatically. Just stripping that away *removes* a feature that has been available for a long time. I'd rather give the
client more control instead of less and let him fine tune this behaviour.

Also, there was good reason for this special casing of SQLite: Adhering to DBAPI2 would really kill concurrent access to SQLite: After reading something from the database, any other thread is blocked from writing to the database.
In our case (we are using SQLAlchemy), the effect is that any attribute access to an ORM mapped object must be "secured" by rolling back after reading (in case you were only reading). Let me try to illustrate in an example:

Previous code:

    modification_time = component.modification_time

If the modification time is not loaded, this will query it from the database in the bowls of SQLAlchemy. Now the database is locked, if it weren't for the "smart" transaction management of the sqlite3 module.

Quoting the new patch to the documentation:

> Standard mode: In this mode, the :mod:`sqlite3` module opens a transaction implicitly before any statement, unless a transaction is already active. This adheres more closely to PEP 249 semantics, possibly at the cost of performance under concurrent workloads.

The cost of performance for our workload here is called deadlock. To fix this in user code, the example code above needs fixing like this:

    modification_time = component.modification_time
    # Unlock SQLite if this came from the database (and we are using SQLite)
    session = get_orm_session(component)
    if session and session.bind.dialect.name == 'sqlite':
        session.rollback()

But this is obviously absolutely wrong if any changes have been made before reading. So the attribute access becomes something like:

    session = get_orm_session(component)
    do_rollback = session and not (session.new or session.updated or session.deleted)

    modification_time = component.modification_time

    # Unlock SQLite if this came from the database (and we are using SQLite)
    if do_rollback and session.bind.dialect.name == 'sqlite':
        session.rollback()

When done right this is not a big deal though - we roll back our database connection in the idle event of our GUI main loop to make sure background processing is not hanging indefinately. But not all code may be so lucky.
When starting with Python I always thought that code like this is harmles:

    >>> conn = sqlite3.connect("test.db")
    >>> data = list(conn.execute("select * from mytable"))

Currently it is, but only because sqlite3 module parses the select as DQL and does not lock the database.


Long story short: The unwieldly operation_needs_transaction_callback gives you full control about this feature of the module that is a misfeature nowadays because it relies on flawed assumptions.


Quoting Aymeric Augustin (aymeric.augustin) again:

> I'm very skeptical of the other API, "in_transaction". Other database backends usually provide an "autocommit" attribute.

Interesting. I've never seen such an attribute. It is there without me ever noticing:

http://cx-oracle.readthedocs.org/en/latest/connection.html?highlight=autocommit#Connection.autocommit
http://initd.org/psycopg/docs/connection.html?highlight=autocommit#connection.autocommit

> "autocommit" is the opposite of "in_transaction" for all practical purposes. There's only two situations where they may be equal:
>
> - before the first query
> - after an explicit commit

I fail to explain the difference. What in_transaction would help me to do is to debug this "I will start an transaction for you no matter what" DBAPI2 crap.
Auto commit is usually a feature of the database - when you connect, you end up in auto commit mode with psql, mysql. sqlplus (Oracle) is different in that it commits on exit (uargh!).

Now DBAPI2 wants no auto commit mode (quote: "Note that if the database supports an auto-commit feature, this must be initially off. An interface method may be provided to turn it back on."), so basically it sends a BEGIN on connect for most databases.
So for real databases (PostgreSQL, Oracle) the assertion connection.autocommit == not connection.in_transaction will always hold.

It is also possible to enforce this in SQLite at the expense of concurrent access as outlined above. Then in_transaction will give you nothing.

In my case I was trying to make any sense of what SQLite was doing wrt. transaction boundaries and for that knowing the *actual* transaction state is very helpful. It basically answers the question "is this connection locking my database"?

So basically, connection.autocommit is an option ("don't do any transaction management for me"), while connection.in_transaction is a query ("has the damn thing started a transaction for me?").

> While you're there, it would be cool to provide "connection.autocommit = True" as an API to enable autocommit, because "connection.isolation_level = None" isn't a good API at all -- it's very obscure and has nothing to do with isolation level whatsoever.

I fully agree with that.

Quoting Jan Hudec:

> The bug is that sqlite module issues implicit COMMIT. SQLite never needs this, many programs need to NOT have it and they can't opt out as isolation_level affects implicit begins only.

Good point that you can't just disable implicit commits. Perhaps that would be a better patch - just remove all implicit committing. Start transactions if a query does not start with "select".

> I don't understand why the implicit commit was initially added.

Me neither. Seems like Oracle does stuff like that automatically: If it is DDL, it will commit for you. Maybe the author expected that coming from some Oracle experience?

> The only difference is that with explicit BEGIN the database remains locked, while without it it is unlocked when the SELECT finishes (read to end or cursor closed).

Exactly. We are doing this: http://www.sqlite.org/appfileformat.html and our GUI is updating from the database. Believe me, it is great fun when the GUI has a left over read lock open, especially, if all writes are done in worker threads.


Quoting Aymeric Augustin (aymeric.augustin):

> Indeed this is a documentation patch describing what I could implement if a core dev gave the slightest hint that it stands a small chance of getting included. There's no point in writing code that Python core doesn't want.

I got that now. :-(
msg220533 - (view) Author: Aymeric Augustin (aymeric.augustin) * Date: 2014-06-14 08:05
> The idea was to not take away from what's there already: The sqlite3 module already has a feature to inspect a command and begin or commit automatically. Just stripping that away *removes* a feature that has been available for a long time. I'd rather give the
client more control instead of less and let him fine tune this behaviour.

For the sake of clarity, I haven't proposed to remove anything. I'm focusing on preserving the same behavior by default (with its advantages and drawbacks) and providing more control for users who need a different behavior, for instance people who use SQLite as an application file format or as a web application storage backend.


> When starting with Python I always thought that code like this is harmles:

    >>> conn = sqlite3.connect("test.db")
    >>> data = list(conn.execute("select * from mytable"))

> Currently it is, but only because sqlite3 module parses the select as DQL and does not lock the database.

It will absolutely remain harmless with my proposal, if you don't change your code.

However, for that use case I would like to encourage the use of autocommit mode. That's really the semantics you want here.


In fact, you've written several sentences along the lines "currently we have $ADVANTAGE". It's easy to (mis)interpret that as implying that I'm trying to remove these advantages. But that's simply not true.


To sum up, if I understand correctly, in_transaction gives you the ability to fight the behavior mandated by the DB API in client code, because you understand it well.

My take is to avoid the problem entirely, and not inflict it to new users, by providing an option to start in autocommit mode and then create transactions only when you want them.

The DB API doesn't forbid this option. It just prevents it from being the default (even though it's what the average developer expects).

It solves the problem of using SQLite as an application file format. Use autocommit as long as you're just reading; start a transaction before writing; commit when you're done writing.

Of course, that only applies to new software. Existing software can happily keep using the current behavior, which will be preserved at least for the lifetime of Python 3.
msg220555 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-14 14:23
> My take is to avoid the problem entirely, and not inflict it to new users, by providing an option to start in autocommit mode and then create transactions only when you want them.

If this statement is accurate, the what you are proposing is just a different (presumably clearer) spelling for 'isolation_level = None'?

I also don't understand why we don't just fix the screwy behavior with regards to savepoint.  It's hard to see how fixing that could be a backward compatibility problem (in a feature release), since you can't use savepoints without isolation_level=None as it stands now.
msg220566 - (view) Author: Aymeric Augustin (aymeric.augustin) * Date: 2014-06-14 15:18
> If this statement is accurate, the what you are proposing is just a different (presumably clearer) spelling for 'isolation_level = None'?

This statement is accurate but it doesn't cover the whole scope of what I'm attempting to fix.

I'm also trying to address the serialization semantics. SQLite always operates at the serializable isolation level. (I'm talking about the SQL concept of transaction isolation levels, not the unrelated isolation_level parameter in sqlite3.) Currently, since sqlite3 doesn't send a BEGIN before SELECT queries, the following scenario is possible:

- Process A reads data -- and the developer who carefully read PEP 249 expects to start a transaction
- Process B writes data
- Process A writes a modified version of what it read and commits

In this scenario A overwrites B, breaking the serialization guarantees expected in that case. A's initial read should have started a transaction (essentially acquiring a shared lock) and B should have been prevented from writing. 

There are two problems here:

- A's code looks like it's safe, but it isn't, because sqlite3 does some magic at odds with PEP 249.
- If you're aware of this and try to enforce the proper guarantees, sqlite3 still gets in the way.

> I also don't understand why we don't just fix the screwy behavior with regards to savepoint.  It's hard to see how fixing that could be a backward compatibility problem (in a feature release), since you can't use savepoints without isolation_level=None as it stands now.

Of course, that would be a good step. But the problem doesn't stop there. What about other SQLite commands? What about "PRAGMA"? I suspect you'll have a hard time defining and maintaining a list of which commands should or shouldn't begin or commit a transaction. That's why I didn't choose this path.

Furthermore, sqlite3 would still enforce a weird mode where it sacrifices serializable isolation under the assumption that you're having a transactional and concurrent workload but you don't care about transactional isolation. There are other use cases that would be better served:

- by favoring serializability over concurrency (eg. the "application file format" case) -- that's what my "standards" mode does
- by favoring concurrency over transactions (eg. the "web app" and the "I don't care just save this data" cases) -- that's what "autocommit" is for
msg220602 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2014-06-15 01:29
On Jun 14, 2014 4:05 AM, "Aymeric Augustin"

> preserving the same behavior by default

That is a requirement, because of backwards compatibility.

> providing more control for users who need a different behavior, for
instance people who use SQLite as an application file format or as a web
application storage backend.

Great ... but make sure it is also simpler.  It is already *possible* to do
what you want with 3rd party code.  So if doing it "right" would still
require an arcane recipe,  that isn't a gain.

At a minimum,  the new docs should explain why the current default is not
sufficient for an application file or a Web backend, and what to do
instead.  New code is only justified if it makes the "do this instead"
simpler and more obvious.

My personal opinion is still that adding more keywords won't do this,
because the old ones will still be there to confuse ... thus my suggestion
of a new function.

> > When starting with Python I always thought that code like this is
harmles:
>
>     >>> conn = sqlite3.connect("test.db")
>     >>> data = list(conn.execute("select * from mytable"))
>
> > Currently it is, but only because sqlite3 module parses the select as
DQL and does not lock the database.
>
> It will absolutely remain harmless with my proposal, if you don't change
your code.
>
> However, for that use case I would like to encourage the use of
autocommit mode. That's really the semantics you want here.

I should NOT need to mess with anything related to "commit" if I am just
selecting.  If that means another new function for
lockless/read-only/uncommitted-read connections, then so be it.

> My take is to avoid the problem entirely, and not inflict it to new
users, by providing an option to start in autocommit mode and then create
transactions only when you want them.

But again, if it requires choosing a commit mode for my selects,  then it
fails to be simple.
msg220615 - (view) Author: Aymeric Augustin (aymeric.augustin) * Date: 2014-06-15 07:30
Jim, I believe this API decision doesn't affect the patch in a major way.

I'll write the rest of the patch and the committer who reviews it will decide.
msg221568 - (view) Author: Torsten Landschoff (torsten) Date: 2014-06-25 18:33
Just a heads up that I am still interested in this issue. I started to write up my expectations to the sqlite module wrt. exception handling.

It's not finished yet but I attached what I got so far. I hope everybody agrees that those doctests should all pass.
History
Date User Action Args
2014-06-25 18:33:12torstensetfiles: + sqlite_sanity_check.rst

messages: + msg221568
2014-06-16 20:27:09bkirchersetnosy: + bkircher
2014-06-15 07:30:02aymeric.augustinsetmessages: + msg220615
2014-06-15 01:29:59Jim.Jewettsetmessages: + msg220602
2014-06-14 15:18:35aymeric.augustinsetmessages: + msg220566
2014-06-14 14:23:20r.david.murraysetmessages: + msg220555
2014-06-14 08:05:33aymeric.augustinsetmessages: + msg220533
2014-06-14 01:46:53torstensetmessages: + msg220519
2014-06-13 21:16:52aymeric.augustinsetmessages: + msg220501
2014-06-13 21:00:15Jim.Jewettsetmessages: + msg220500
2014-06-13 20:07:18aymeric.augustinsetmessages: + msg220491
2014-06-13 18:42:18Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg220478
2014-06-06 14:10:55aymeric.augustinsetfiles: + transaction-api-proposal-issues-8145-9924-10740-16958.patch

messages: + msg219884
2014-06-05 15:25:01bulbsetmessages: + msg219818
2014-06-05 12:17:12pitrousetmessages: + msg219806
stage: needs patch
2014-06-04 15:43:29aymeric.augustinsetmessages: + msg219761
2014-06-04 14:40:32r.david.murraysetmessages: + msg219759
versions: - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4
2014-06-04 14:25:05bulbsetmessages: + msg219758
2014-06-04 14:14:04bulbsetversions: + Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4
2014-06-04 14:12:34bulbsetnosy: + bulb
messages: + msg219757
2014-04-17 23:04:26floxsetnosy: + flox
2014-04-17 20:58:53pitrousetnosy: + pitrou
messages: + msg216740
2014-04-17 20:43:46aymeric.augustinsetmessages: + msg216738
2014-04-17 20:01:12monsantosetmessages: + msg216734
2014-04-17 19:15:50aymeric.augustinsetnosy: + aymeric.augustin
messages: + msg216732
2014-04-17 17:29:45monsantosetnosy: + monsanto
messages: + msg216715
2014-02-01 22:47:07Ronny.Pfannschmidtsetmessages: + msg209924
2014-02-01 21:11:03r.david.murraysetmessages: + msg209917
2014-02-01 18:20:30Ronny.Pfannschmidtsetmessages: + msg209904
2014-02-01 17:25:13r.david.murraysetmessages: + msg209901
2014-02-01 17:00:36Ronny.Pfannschmidtsetnosy: + Ronny.Pfannschmidt
messages: + msg209900
2014-01-09 22:05:44r.david.murraysettype: behavior -> enhancement
messages: + msg207800
versions: + Python 3.5, - Python 3.1, Python 2.7, Python 3.2, Python 3.3
2014-01-09 21:20:28zzzeeksetmessages: + msg207796
2014-01-09 20:29:13adamtjsetversions: + Python 3.1, Python 2.7, Python 3.2
nosy: + adamtj
title: sqlite3 module should allow DDL statements in transactions -> sqlite3 module breaks transactions and potentially corrupts data
messages: + msg207789


type: enhancement -> behavior
2013-04-04 19:37:21zzzeeksetnosy: + zzzeek
2013-01-26 17:30:57tshepangsetnosy: + tshepang
2012-10-18 18:56:16Jeremy Bankssetnosy: + Jeremy Banks
2011-09-17 15:39:45Mark.Bucciarellisetnosy: + Mark.Bucciarelli
messages: + msg144197
2011-05-20 10:28:22torstensetmessages: + msg136362
2011-03-30 20:32:44torstensetfiles: + sqlite_transaction_config_v2.diff

messages: + msg132612
2011-03-29 07:39:29torstensetmessages: + msg132470
2011-03-28 19:49:14dholthsetmessages: + msg132414
2011-03-27 01:56:07torstensetfiles: + sqlite_transaction_config_py27.diff
2011-03-27 01:54:12torstensetfiles: + sqlite_transaction_config_py3.diff

messages: + msg132287
2011-03-27 01:34:16torstensetfiles: + sqlite_transaction_config_py27.diff
nosy: + torsten
messages: + msg132285

2011-02-18 21:17:20dholthsetnosy: + dholth
messages: + msg128815
2010-12-21 00:10:59asvetlovsetnosy: + asvetlov
2010-12-20 22:08:46scott.urbansetmessages: + msg124407
2010-12-20 18:34:15r.david.murraysetversions: + Python 3.3, - Python 2.6
nosy: + ghaering, r.david.murray

messages: + msg124398

type: behavior -> enhancement
2010-12-20 18:03:40scott.urbansetfiles: + test_sqlite_ddl.py

messages: + msg124393
2010-12-20 18:02:11scott.urbancreate