Author adamtj
Recipients Jeremy Banks, Mark.Bucciarelli, adamtj, asvetlov, dholth, ghaering, r.david.murray, scott.urban, torsten, tshepang, zzzeek
Date 2014-01-09.20:29:11
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1389299353.59.0.434979224272.issue10740@psf.upfronthosting.co.za>
In-reply-to
Content
The proposed patches don't fix the problem.  They may well allow DDL in transactions, but that's not the real problem.

A database library must *NEVER* implicitly commit or rollback.  That's completely insane.

    >>> import this
    ...
    Explicit is better than implicit.

If a statement can't be executed in a transaction, then trying to execute such a statement in a transaction is an error.  Period.  An exception *MUST* be raised.  It is wrong to guess that the user really wanted to commit first.

    >>> import this
    ...
    Errors should never pass silently.
    ...
    In the face of ambiguity, refuse the temptation to guess.

If such an exception is raised, you'll run into the problem exactly once before figuring out that you need to commit or rollback first.  You can then change your code to safely and explicitly account for that limitation.

This issue is not an enhancement, it's a bug.  One might argue that fixing it will break existing code, but it won't.  Any affected code is already broken.  It's depending on a promised level of safety, and this bug is breaking that promise.  It's better to fail noisily and consistently than to usually work, but sometimes silently corrupt data.

As is, it's pretty much guaranteed that there is existing code that does DDL expecting to be in a transaction when it isn't.  Even a well-tested schema upgrade can fail in production if the data is slightly different that in a testing database.  Unique constraints can fail, etc.  I frequently use the psycopg2 module and psql command, which get this issue right.  They've saved me several times.

The current behavior is buggy and should be changed to be correct.  There should not merely be an option to enable the correct behavior.  There should also be no option to enable the old buggy behavior.  It's too dangerous.

In general, it's a code-smell to inspect the SQL that's being executed before sending it to the database.  The only reason I can think of to inspect SQL is to work around brokenness in the underlying database.  Suppose, for example, that SQLite implicitly committed when it got DDL.  (I don't know what it actually does.  Just suppose.)  In that case, it would be necessary to check for DDL and raise an exception before passing the statement to the database.  If the database correctly errors in such a situation, then the python wrapper should just pass the DDL to the database and deal with the resulting error.  That's way easier that trying to correctly detect what the statement type is.  Parsing SQL correctly is hard.

As evidence of that, note that the existing statement detection code is broken: it doesn't strip comments first!  A simple SELECT statement preceeded by a comment will implicitly commit!  But commenting is good.

    >>> import this
    ...
    Readability counts.

So it's not even just an issue of DDL or PRAGMAs!

Anything that causes the underlying database to implicitly commit must be avoided (yet the python module goes out of it's way to add *MORE* such buggy behavior!)  Fixing brokenness in the underlying database is the only reason to inspect SQL as it passes through this module.  If there are other ways to avoid such problems, they will likely be better than inspecting SQL.

Anything which causes implicit rollbacks in the underlying database is a similar issue, but easier to handle without parsing SQL.  It's not safe to implicitly rollback, even if a new transaction is begun.  For example, you might be doing foreign key validation outside the database.  If one INSERT were implicitly rolled back and a new transaction begun, a second INSERT may then succeed and be committed.  Now you have a constraint violation, even though you correctly checked it before.

If there is anything that triggers an implicit rollback in the underlying database connection, those rollbacks must be dectected all subsequent statements or actions on the python wrapper *should* raise exceptions until an explicit rollback is performed, except for commit() which *must* raise.

For example:

    con.isolation_level = 'DEFERRED'
    try:
        # Nothing in here can implicitly commit or rollback safely.  
        # Doing so risks data corruption.

        cur.execute("INSERT INTO foo ...")

        try:
            con.somehow_trigger_implicit_rollback() # must raise
        except:
            # Throw away the exception.
            pass

        try:
            # This *should* raise too, but isn't strictly necessary,
            # as long as commit() is implemented correctly.
            cur.execute("INSERT INTO bar ...")
        except:
            pass
        
        # The underlying database rolled back our transaction.
        # A successful commit here would not match reality.
        # This commit() must raise.
        con.commit()
    except:
        # This rollback() can succeed, because it puts the 
        # "con" object back into correspondance with the underlying
        # database connection state.
        con.rollback()
        logger.error('something bad happened')
        raise

If nothing in the database does implicit COMMITs or ROLLBACKs, then it's probably sufficient to leave all the error handling to the database and only raise exceptions when it indicates problems.

Here is an example demonstrating that a SELECT can trigger an implicit commit.

    Python 2.7.3 (default, Sep 26 2013, 20:03:06)
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
    >>> # Create a connection.
    ... import sqlite3
    >>> con = sqlite3.connect(":memory:", isolation_level='DEFERRED')
    >>> cur = con.cursor()
    >>>
    >>> # Create some state.
    ... cur.execute("CREATE TABLE foo (i int);")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> cur.execute("INSERT INTO foo VALUES (0);")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> con.commit()
    >>>
    >>> # Change the state, do a SELECT, then rollback.
    ... cur.execute("UPDATE foo SET i=1")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> cur.execute("SELECT 1;")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> con.rollback()
    >>>
    >>> # Verify that the rollback worked.
    ... cur.execute("SELECT i FROM foo")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> i = cur.fetchone()[0]
    >>> print 'i is', i
    i is 0
    >>> assert i == 0
    >>>
    >>> # Change some state, do a different SELECT, then attempt to rollback.
    ... cur.execute("UPDATE foo SET i=2")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> # This will incorrectly commit:
    ... cur.execute("""
    ...     /*
    ...      * Comments are good for readability, but because of this bug, they can
    ...      * cause incorrect implicit commits.
    ...      */
    ...     SELECT 1;
    ... """)
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> con.rollback()
    >>>
    >>>
    >>> # The rollback succeeded and rolling back a different transaction than
    >>> # the one we expected.
    ... cur.execute("SELECT i FROM foo")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> i = cur.fetchone()[0]
    >>> print 'i is', i
    i is 2
    >>> assert i == 0
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AssertionError
History
Date User Action Args
2014-01-09 20:29:14adamtjsetrecipients: + adamtj, ghaering, Jeremy Banks, r.david.murray, zzzeek, asvetlov, dholth, torsten, scott.urban, tshepang, Mark.Bucciarelli
2014-01-09 20:29:13adamtjsetmessageid: <1389299353.59.0.434979224272.issue10740@psf.upfronthosting.co.za>
2014-01-09 20:29:13adamtjlinkissue10740 messages
2014-01-09 20:29:11adamtjcreate