Author torsten
Recipients Jeremy Banks, Jim.Jewett, Mark.Bucciarelli, Ronny.Pfannschmidt, adamtj, asvetlov, aymeric.augustin, bulb, dholth, flox, ghaering, monsanto, pitrou, r.david.murray, scott.urban, torsten, tshepang, zzzeek
Date 2014-06-14.01:46:48
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1402710413.21.0.170754530103.issue10740@psf.upfronthosting.co.za>
In-reply-to
Content
Hi all,

let me warn you first: I have put a lot a effort in writing the patch back then. I moved on since as I don't even know when the improvement will make any difference for me. As so often, trying to contribute to an open source project turned out to be a waste of time, for the following reasons:

1. the patch first has to made it past review before being committed

2. it will be released with some distant version (looking from 2011)

3. I couldn't just replace Python with a new version for our builds (even if it was released within a month)

In the end I had to work around the problem in client code. If anybody cares, the solution was to special case SQLite as a database backend and do transactions manually. As we now have to support Oracle as well, we don't have the luxury of transactions with DDL anyhow so basically we drop out of DBAPI2 transaction management and everything becomes easy again.


I should have replied earlier to the questions. Maybe by answering I stand a chance that I will ever use sqlite3 in Python 3 again.

Quoting Aymeric Augustin (aymeric.augustin):

> That patch solves the problem, at the cost of introducing an unwieldy API, "operation_needs_transaction_callback".

You are right, I had forgotten that each client would need to know about it. In fact, the value used in the tests etc. should probably be the default.

The idea was to not take away from what's there already: The sqlite3 module already has a feature to inspect a command and begin or commit automatically. Just stripping that away *removes* a feature that has been available for a long time. I'd rather give the
client more control instead of less and let him fine tune this behaviour.

Also, there was good reason for this special casing of SQLite: Adhering to DBAPI2 would really kill concurrent access to SQLite: After reading something from the database, any other thread is blocked from writing to the database.
In our case (we are using SQLAlchemy), the effect is that any attribute access to an ORM mapped object must be "secured" by rolling back after reading (in case you were only reading). Let me try to illustrate in an example:

Previous code:

    modification_time = component.modification_time

If the modification time is not loaded, this will query it from the database in the bowls of SQLAlchemy. Now the database is locked, if it weren't for the "smart" transaction management of the sqlite3 module.

Quoting the new patch to the documentation:

> Standard mode: In this mode, the :mod:`sqlite3` module opens a transaction implicitly before any statement, unless a transaction is already active. This adheres more closely to PEP 249 semantics, possibly at the cost of performance under concurrent workloads.

The cost of performance for our workload here is called deadlock. To fix this in user code, the example code above needs fixing like this:

    modification_time = component.modification_time
    # Unlock SQLite if this came from the database (and we are using SQLite)
    session = get_orm_session(component)
    if session and session.bind.dialect.name == 'sqlite':
        session.rollback()

But this is obviously absolutely wrong if any changes have been made before reading. So the attribute access becomes something like:

    session = get_orm_session(component)
    do_rollback = session and not (session.new or session.updated or session.deleted)

    modification_time = component.modification_time

    # Unlock SQLite if this came from the database (and we are using SQLite)
    if do_rollback and session.bind.dialect.name == 'sqlite':
        session.rollback()

When done right this is not a big deal though - we roll back our database connection in the idle event of our GUI main loop to make sure background processing is not hanging indefinately. But not all code may be so lucky.
When starting with Python I always thought that code like this is harmles:

    >>> conn = sqlite3.connect("test.db")
    >>> data = list(conn.execute("select * from mytable"))

Currently it is, but only because sqlite3 module parses the select as DQL and does not lock the database.


Long story short: The unwieldly operation_needs_transaction_callback gives you full control about this feature of the module that is a misfeature nowadays because it relies on flawed assumptions.


Quoting Aymeric Augustin (aymeric.augustin) again:

> I'm very skeptical of the other API, "in_transaction". Other database backends usually provide an "autocommit" attribute.

Interesting. I've never seen such an attribute. It is there without me ever noticing:

http://cx-oracle.readthedocs.org/en/latest/connection.html?highlight=autocommit#Connection.autocommit
http://initd.org/psycopg/docs/connection.html?highlight=autocommit#connection.autocommit

> "autocommit" is the opposite of "in_transaction" for all practical purposes. There's only two situations where they may be equal:
>
> - before the first query
> - after an explicit commit

I fail to explain the difference. What in_transaction would help me to do is to debug this "I will start an transaction for you no matter what" DBAPI2 crap.
Auto commit is usually a feature of the database - when you connect, you end up in auto commit mode with psql, mysql. sqlplus (Oracle) is different in that it commits on exit (uargh!).

Now DBAPI2 wants no auto commit mode (quote: "Note that if the database supports an auto-commit feature, this must be initially off. An interface method may be provided to turn it back on."), so basically it sends a BEGIN on connect for most databases.
So for real databases (PostgreSQL, Oracle) the assertion connection.autocommit == not connection.in_transaction will always hold.

It is also possible to enforce this in SQLite at the expense of concurrent access as outlined above. Then in_transaction will give you nothing.

In my case I was trying to make any sense of what SQLite was doing wrt. transaction boundaries and for that knowing the *actual* transaction state is very helpful. It basically answers the question "is this connection locking my database"?

So basically, connection.autocommit is an option ("don't do any transaction management for me"), while connection.in_transaction is a query ("has the damn thing started a transaction for me?").

> While you're there, it would be cool to provide "connection.autocommit = True" as an API to enable autocommit, because "connection.isolation_level = None" isn't a good API at all -- it's very obscure and has nothing to do with isolation level whatsoever.

I fully agree with that.

Quoting Jan Hudec:

> The bug is that sqlite module issues implicit COMMIT. SQLite never needs this, many programs need to NOT have it and they can't opt out as isolation_level affects implicit begins only.

Good point that you can't just disable implicit commits. Perhaps that would be a better patch - just remove all implicit committing. Start transactions if a query does not start with "select".

> I don't understand why the implicit commit was initially added.

Me neither. Seems like Oracle does stuff like that automatically: If it is DDL, it will commit for you. Maybe the author expected that coming from some Oracle experience?

> The only difference is that with explicit BEGIN the database remains locked, while without it it is unlocked when the SELECT finishes (read to end or cursor closed).

Exactly. We are doing this: http://www.sqlite.org/appfileformat.html and our GUI is updating from the database. Believe me, it is great fun when the GUI has a left over read lock open, especially, if all writes are done in worker threads.


Quoting Aymeric Augustin (aymeric.augustin):

> Indeed this is a documentation patch describing what I could implement if a core dev gave the slightest hint that it stands a small chance of getting included. There's no point in writing code that Python core doesn't want.

I got that now. :-(
History
Date User Action Args
2014-06-14 01:46:54torstensetrecipients: + torsten, ghaering, pitrou, Jeremy Banks, r.david.murray, zzzeek, asvetlov, flox, adamtj, dholth, monsanto, scott.urban, aymeric.augustin, tshepang, Ronny.Pfannschmidt, Mark.Bucciarelli, Jim.Jewett, bulb
2014-06-14 01:46:53torstensetmessageid: <1402710413.21.0.170754530103.issue10740@psf.upfronthosting.co.za>
2014-06-14 01:46:53torstenlinkissue10740 messages
2014-06-14 01:46:48torstencreate