Created on 2010-12-20 18:02 by scott.urban, last changed 2014-04-17 23:04 by flox.
|pysql-transactions.2.diff||scott.urban, 2010-12-20 18:02||proposed patch|
|test_sqlite_ddl.py||scott.urban, 2010-12-20 18:03||test script|
|sqlite_transaction_config_py27.diff||torsten, 2011-03-27 01:34||Patch against Python 2.7|
|sqlite_transaction_config_py3.diff||torsten, 2011-03-27 01:54||Patch for Python 3 default branch||review|
|sqlite_transaction_config_py27.diff||torsten, 2011-03-27 01:56||Patch against Python 2.7, take 2|
|sqlite_transaction_config_v2.diff||torsten, 2011-03-30 20:32||Updated patch: Added docs||review|
|msg124392 - (view)||Author: Scott Urban (scott.urban)||Date: 2010-12-20 18:02|
The python sqlite module automatically commits open transactions when it encounters a DDL statement. This is unnecessary; DDL is transactional in my testing (see attached). Attached patch addresses the issue. Patch is against 2.6.1, but looking at Trunk in svn, it seems like the patch is needed and would apply. One issue I could foresee is that this behavior might depend on the sqlite version in use (I'm on 3.6.10). Patch also allows pragma statement.
|msg124393 - (view)||Author: Scott Urban (scott.urban)||Date: 2010-12-20 18:03|
Here are some tests.
|msg124398 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2010-12-20 18:34|
See also Issue 8145. It would be nice if someone could sort all this out, but I'm not knowledgeable enough to do so. For this patch, it would be a significant change it behaviour. Therefore it would have to be a new feature controlled by a flag of some sort (which is part of the reason for the reference to issue 8145).
|msg124407 - (view)||Author: Scott Urban (scott.urban)||Date: 2010-12-20 22:08|
I find the way that the sqlite3 module handles transactions pretty surprising in general, but I agree that someone who got used to DDL not rolling back could in theory find this patch surprising. We will apply this patch to our python build because having DDL covered by a transactions is convenient for certain tasks. If anyone can think of a problem with this simple patch, I'd like to hear it. Thanks Scott
|msg128815 - (view)||Author: Daniel Holth (dholth)||Date: 2011-02-18 21:17|
I want transactional DDL too. I was tremendously surprised that I could not duplicate the way sqlite3 behaves on the command line from witin pysqlite. Instead of this patch, I would rather be able to instruct pysqlite to always begin a transaction for any kind of statement (I hear this is a requirement for DB-API compliance) and never perform an implicit commit for any reason. For example, someone on the google code project had a complaint that 'SAVEPOINT' (create a subtransaction) automatically commits because pysqlite doesn't know about it. I tried to trick pysqlite into doing what I wanted by prepending /* update */ to every CREATE TABLE statement but it didn't seem to quite work. Instead, I would prefer a pysqlite that does not strcmp(statement) at all.
|msg132285 - (view)||Author: Torsten Landschoff (torsten)||Date: 2011-03-27 01:34|
The attached patch is my take on this issue. I ran into the problem that during schema upgrades dropping a table was not rolled back. In another instance, renaming a table was not rolled back. This greatly increases the risk of data loss for our application. Because I do not currently foresee which commands might need dropping out of a transaction and because of backwards compatibility woes, I added a new field to the sqlite3.Connection: operation_needs_transaction_callback This function is called to decide if a running transaction should be implicitly committed (I'd consider this dangerous), if a transaction has to be started if not running (should normally always hold) or if the transaction state should be left alone. For example, the "pragma foreign_keys = ..." is a no-op inside a transaction, therefore an implicit begin would be possibly harmful. In our application, we enable foreign keys when getting the connection out of the SQLAlchemy pool via a pool listener, which would be disabled if there is an implicit begin triggered. The patch also adds a bunch of unit tests to cover the new behaviour.
|msg132287 - (view)||Author: Torsten Landschoff (torsten)||Date: 2011-03-27 01:54|
Same patch for Python 3. In fact, this also adds a missing Py_XDECREF.
|msg132414 - (view)||Author: Daniel Holth (dholth)||Date: 2011-03-28 19:49|
Torsten basically you are suggesting that PRAGMA would never work at all with my 'do not strcmp() the sql at all, always begin a transaction' approach?
|msg132470 - (view)||Author: Torsten Landschoff (torsten)||Date: 2011-03-29 07:39|
> Torsten basically you are suggesting that PRAGMA would never work at all with my 'do not strcmp() the sql at all, always begin a transaction' approach? No. Most pragmas should still work and getting the current setting of a pragma should also work. I was only referring to http://www.sqlite.org/pragma.html#pragma_foreign_keys Quote: > This pragma is a no-op within a transaction; foreign key constraint enforcement may only be enabled or disabled when there is no pending BEGIN or SAVEPOINT. I did not see such a restriction for other pragmas though I admit that I did not reread the list in full.
|msg132612 - (view)||Author: Torsten Landschoff (torsten)||Date: 2011-03-30 20:32|
The attached patch is an updated version which adds a bit of documentation.
|msg136362 - (view)||Author: Torsten Landschoff (torsten)||Date: 2011-05-20 10:28|
I updated the patch for upstream pysqlite2. Available at http://code.google.com/p/pysqlite/issues/detail?id=24 Patch over there is for Python 2 (tested with our production Python 2.6).
|msg144197 - (view)||Author: Mark Bucciarelli (Mark.Bucciarelli)||Date: 2011-09-17 15:39|
Opened http://bugs.python.org/issue12997 in case there is a way to solve the foreign_key PRAGMA issue with a less disruptive fix.
|msg207789 - (view)||Author: Adam Tomjack (adamtj)||Date: 2014-01-09 20:29|
The proposed patches don't fix the problem. They may well allow DDL in transactions, but that's not the real problem. A database library must *NEVER* implicitly commit or rollback. That's completely insane. >>> import this ... Explicit is better than implicit. If a statement can't be executed in a transaction, then trying to execute such a statement in a transaction is an error. Period. An exception *MUST* be raised. It is wrong to guess that the user really wanted to commit first. >>> import this ... Errors should never pass silently. ... In the face of ambiguity, refuse the temptation to guess. If such an exception is raised, you'll run into the problem exactly once before figuring out that you need to commit or rollback first. You can then change your code to safely and explicitly account for that limitation. This issue is not an enhancement, it's a bug. One might argue that fixing it will break existing code, but it won't. Any affected code is already broken. It's depending on a promised level of safety, and this bug is breaking that promise. It's better to fail noisily and consistently than to usually work, but sometimes silently corrupt data. As is, it's pretty much guaranteed that there is existing code that does DDL expecting to be in a transaction when it isn't. Even a well-tested schema upgrade can fail in production if the data is slightly different that in a testing database. Unique constraints can fail, etc. I frequently use the psycopg2 module and psql command, which get this issue right. They've saved me several times. The current behavior is buggy and should be changed to be correct. There should not merely be an option to enable the correct behavior. There should also be no option to enable the old buggy behavior. It's too dangerous. In general, it's a code-smell to inspect the SQL that's being executed before sending it to the database. The only reason I can think of to inspect SQL is to work around brokenness in the underlying database. Suppose, for example, that SQLite implicitly committed when it got DDL. (I don't know what it actually does. Just suppose.) In that case, it would be necessary to check for DDL and raise an exception before passing the statement to the database. If the database correctly errors in such a situation, then the python wrapper should just pass the DDL to the database and deal with the resulting error. That's way easier that trying to correctly detect what the statement type is. Parsing SQL correctly is hard. As evidence of that, note that the existing statement detection code is broken: it doesn't strip comments first! A simple SELECT statement preceeded by a comment will implicitly commit! But commenting is good. >>> import this ... Readability counts. So it's not even just an issue of DDL or PRAGMAs! Anything that causes the underlying database to implicitly commit must be avoided (yet the python module goes out of it's way to add *MORE* such buggy behavior!) Fixing brokenness in the underlying database is the only reason to inspect SQL as it passes through this module. If there are other ways to avoid such problems, they will likely be better than inspecting SQL. Anything which causes implicit rollbacks in the underlying database is a similar issue, but easier to handle without parsing SQL. It's not safe to implicitly rollback, even if a new transaction is begun. For example, you might be doing foreign key validation outside the database. If one INSERT were implicitly rolled back and a new transaction begun, a second INSERT may then succeed and be committed. Now you have a constraint violation, even though you correctly checked it before. If there is anything that triggers an implicit rollback in the underlying database connection, those rollbacks must be dectected all subsequent statements or actions on the python wrapper *should* raise exceptions until an explicit rollback is performed, except for commit() which *must* raise. For example: con.isolation_level = 'DEFERRED' try: # Nothing in here can implicitly commit or rollback safely. # Doing so risks data corruption. cur.execute("INSERT INTO foo ...") try: con.somehow_trigger_implicit_rollback() # must raise except: # Throw away the exception. pass try: # This *should* raise too, but isn't strictly necessary, # as long as commit() is implemented correctly. cur.execute("INSERT INTO bar ...") except: pass # The underlying database rolled back our transaction. # A successful commit here would not match reality. # This commit() must raise. con.commit() except: # This rollback() can succeed, because it puts the # "con" object back into correspondance with the underlying # database connection state. con.rollback() logger.error('something bad happened') raise If nothing in the database does implicit COMMITs or ROLLBACKs, then it's probably sufficient to leave all the error handling to the database and only raise exceptions when it indicates problems. Here is an example demonstrating that a SELECT can trigger an implicit commit. Python 2.7.3 (default, Sep 26 2013, 20:03:06) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> >>> # Create a connection. ... import sqlite3 >>> con = sqlite3.connect(":memory:", isolation_level='DEFERRED') >>> cur = con.cursor() >>> >>> # Create some state. ... cur.execute("CREATE TABLE foo (i int);") <sqlite3.Cursor object at 0x7fc1b7d51500> >>> cur.execute("INSERT INTO foo VALUES (0);") <sqlite3.Cursor object at 0x7fc1b7d51500> >>> con.commit() >>> >>> # Change the state, do a SELECT, then rollback. ... cur.execute("UPDATE foo SET i=1") <sqlite3.Cursor object at 0x7fc1b7d51500> >>> cur.execute("SELECT 1;") <sqlite3.Cursor object at 0x7fc1b7d51500> >>> con.rollback() >>> >>> # Verify that the rollback worked. ... cur.execute("SELECT i FROM foo") <sqlite3.Cursor object at 0x7fc1b7d51500> >>> i = cur.fetchone() >>> print 'i is', i i is 0 >>> assert i == 0 >>> >>> # Change some state, do a different SELECT, then attempt to rollback. ... cur.execute("UPDATE foo SET i=2") <sqlite3.Cursor object at 0x7fc1b7d51500> >>> # This will incorrectly commit: ... cur.execute(""" ... /* ... * Comments are good for readability, but because of this bug, they can ... * cause incorrect implicit commits. ... */ ... SELECT 1; ... """) <sqlite3.Cursor object at 0x7fc1b7d51500> >>> con.rollback() >>> >>> >>> # The rollback succeeded and rolling back a different transaction than >>> # the one we expected. ... cur.execute("SELECT i FROM foo") <sqlite3.Cursor object at 0x7fc1b7d51500> >>> i = cur.fetchone() >>> print 'i is', i i is 2 >>> assert i == 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError
|msg207796 - (view)||Author: mike bayer (zzzeek)||Date: 2014-01-09 21:20|
well Adam, you might also be surprised to see pysqlite not doing very well on the *other* side of the equation either; that is, when it *begins* the transaction. Issue http://bugs.python.org/issue9924 refers to this and like this one, hasn't seen any activity since 2011. You might be interested in the rationale on that one, which is that python apps may wish to have some degree of concurrency against a sqlite file for an application that is doing all SELECTs. I am certainly in favor of a pure pep-249 approach that emits BEGIN on the first execute() call period, and never implicitly rolls back or commits. However, I disagree this should be enabled by default nor that there should not be any option for the old behavior. I also think the "delayed BEGIN" feature should still be available to counteract SQLite's particular difficulty with concurrency. If there is code that relies upon a bug in order to function, which would then fail to function in that way if the bug is fixed, then by definition fixing that bug is a backwards-incompatible change. Python std lib can't afford to roll out a change that blatantly backwards-incompatible. The issue regarding comments not being parsed unfortunately should also be a separate issue that needs to be fixed, as nasty as it is that pysqlite tries to parse SQL. I disagree that most users are expecting SQLite's DDL to be transactional; transactional DDL is a luxury feature to many, including the vast numbers of developers that use MySQL, not to mention Oracle, neither of which have any transactional DDL capabilities.
|msg207800 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2014-01-09 22:05|
On Thu, 09 Jan 2014 20:29:15 +0000, Adam Tomjack <email@example.com> wrote: > This issue is not an enhancement, it's a bug. One might argue that > fixing it will break existing code, but it won't. Any affected code > is already broken. It's depending on a promised level of safety, and > this bug is breaking that promise. It's better to fail noisily and > consistently than to usually work, but sometimes silently corrupt > data. That's not how our backward compatibility policy works. If a program runs correctly, a change in a maintenance release should not cause that program to fail, *even if the change is a bug fix*. Now, that said, silent data corruption can change that calculus, but in this case there are many many working programs out there that don't experience data corruption, so breaking them in a maintenance release is just not acceptable. It is possible that there are specific things that can be fixed without breaking backward compatibility, but they will have to be argued individually. > In general, it's a code-smell to inspect the SQL that's being executed > before sending it to the database. The only reason I can think of to There are reasons why the sqlite module was designed the way it was, having to do with an intersection between how sqlite works and the DB API 2 specification. I'm not saying the design (and implementation of that design) has no bugs, but there *are* reasons why the choices made were made, including the concurrency issue mentioned by Mike (which if I understand correctly was even more problematic in earlier versions of sqlite.) > inspect SQL is to work around brokenness in the underlying database. > Suppose, for example, that SQLite implicitly committed when it got > DDL. (I don't know what it actually does. Just suppose.) In that By default, SQLite starts a transaction when any command that changes the database is executed (the docs say, "basically, anything other than a SELECT), and "automatically started transactions are committed when the last query finishes". I'm not sure exactly what that means. Note that there is no way in sqlite to turn auto-transaction behavior off. > case, it would be necessary to check for DDL and raise an exception > before passing the statement to the database. If the database Like I said, this would need to be a new feature, due to our backward compatibility policy. > correctly errors in such a situation, then the python wrapper should > just pass the DDL to the database and deal with the resulting error. > That's way easier that trying to correctly detect what the statement > type is. Parsing SQL correctly is hard. If you want full control of transactions while using the sqlite module, you can set the isolation_level to None (sqlite's default autocommit mode) and then carefully control when you issue what kinds of statements, including BEGIN statements. Any exceptions produced by the database will in that situation be propagated to the application. That loses the (admittedly somewhat limited) DB-interoperability benefits of using the DB API 2, though. > As evidence of that, note that the existing statement detection code > is broken: it doesn't strip comments first! A simple SELECT statement > preceeded by a comment will implicitly commit! But commenting is > good. Yes, the fact that statement detection (and what statements are detected) is buggy is the essence of this issue. But note that in Python code it would be much more natural to put the comment in the python code, not the sql string. > Anything that causes the underlying database to implicitly commit must > be avoided (yet the python module goes out of it's way to add *MORE* > such buggy behavior!) Fixing brokenness in the underlying database is > the only reason to inspect SQL as it passes through this module. If Yep. > there are other ways to avoid such problems, they will likely be > better than inspecting SQL. Do you have a better way to suggest? > Anything which causes implicit rollbacks in the underlying database is > a similar issue, but easier to handle without parsing SQL. It's not There are no implicit rollbacks at issue anywhere here, that I can see. Do you have an example where the module does an implicit rollback? > Here is an example demonstrating that a SELECT can trigger an implicit > commit. Yes, the comment issue should be fixed. That's a different bug, though, and should be addressed in a separate issue. Unfortunately, since some people seem to be *depending* on the fact that putting in a comment disables statement detection, that, too will need to be a new feature. I'm sorry about that, but Python has a strong commitment to backward compatibility.
|msg209900 - (view)||Author: Ronny Pfannschmidt (Ronny.Pfannschmidt)||Date: 2014-02-01 17:00|
could we please get the option to opt-out of that behaviour, as a extra connection option maybe with the normal python sqlite bindings its impossible to have database migrations work safely which IMHO makes this a potential data-loss issue
|msg209901 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2014-02-01 17:25|
Opt out of what behavior?
|msg209904 - (view)||Author: Ronny Pfannschmidt (Ronny.Pfannschmidt)||Date: 2014-02-01 18:20|
the sqlite binding deciding how to handle transactions
|msg209917 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2014-02-01 21:11|
As noted above you get that by setting isolation_level to None. That feature has always been available. (With isolation_level set to None, the sqlite wrapper module itself never issues any BEGIN statements.)
|msg209924 - (view)||Author: Ronny Pfannschmidt (Ronny.Pfannschmidt)||Date: 2014-02-01 22:47|
http://hg.python.org/releasing/3.4/file/447c758cdc26/Modules/_sqlite/cursor.c#l789 also i dont see the isolation level being taking into account in other parts of the code
|msg216715 - (view)||Author: Chris Monsanto (monsanto)||Date: 2014-04-17 17:29|
This issue has been open for 4 years, last update was 2 months ago. Lack of transactional DDL is a big deal for Python programs that use SQLite heavily. We have a patch for Python 3 that applies cleanly and as far as I can tell works fine. I've been using it in testing for months, and I'm about to deploy it in production. What's the next step here? What do we need to do to get something merged?
|msg216732 - (view)||Author: (aaugustin) *||Date: 2014-04-17 19:15|
Hey -- maintainer of Django's transaction support here. This ticket was brought to my attention again today. As I know a few things about this issue and I see Python core devs asking for input, I'll give my $0.02. The core of this issue is that, **by default**, the sqlite3 module performs a simplistic parsing of SQL queries and decides to commit before anything it doesn't understand, for example "SAVEPOINT foo". I don't think it's a good idea to change sqlite's default operation mode to autocommit. PEP 249's prescription for transaction management aren't practical (and Django enforces the opposite behavior by default) but that's beyond the point. It's way too much backwards incompatible. However, I don't think it's a good idea either to automatically send a COMMIT when I want to make a SAVEPOINT. In fact, if you want to use transactions with sqlite, connection.isolation_level = None is almost warranted -- and then you do everything manually. For a slightly longer explanation, see https://www.youtube.com/watch?v=09tM18_st4I#t=1751 I've struggled mightily with all this when rewriting Django's transaction management. See: - https://github.com/django/django/blob/3becac84/django/db/backends/sqlite3/base.py#L107 https://github.com/django/django/blob/3becac84/django/db/backends/sqlite3/base.py#L398-L403 - https://github.com/django/django/blob/3becac84/django/db/transaction.py#L185-L195 I have implemented the workarounds I need and I we won't be able to remove them from Django anytime soon. As a consequence, I have little interest in getting this fixed. Still, it would be sane to stop committing before savepoints by default -- that kinda ruins the concept of savepoints :-) It would be even saner to stop parsing SQL. Does http://hg.python.org/cpython/file/85fd955c6fc8/Modules/_sqlite/cursor.c#l32 look like an adequate parser for http://sqlite.org/lang.html? Unfortunately, I don't have backwards-compatible proposal to fix this. Trying to account for a bit more syntax will help in the short term but not fix the underlying issue. PS: I find it unfair to brand SQLite as a neutered database engine. Its transaction model is vastly superior to, say, MySQL. Too bad sqlite3 ruins it.
|msg216734 - (view)||Author: Chris Monsanto (monsanto)||Date: 2014-04-17 20:01|
> Unfortunately, I don't have backwards-compatible proposal to fix this. Trying to account for a bit more syntax will help in the short term but not fix the underlying issue. aaugustin -- the patch by torsen made 3 years ago is backwards compatible. It adds a hook that you can use to disable autocommits, but you have to opt-in. I think that is good enough for now. We can worry about how to transition people away from the broken transaction model in the future. Let's treat this as any other backwards compatibility problem in Python. We have a flag, that when enabled, removes the shitty behavior. In a future release, you get a warning for relying on the shitty behavior. In a release after that, we kill the old behavior. Then we deprecate the flag. In other words, it literally doesn't matter what the flag is. torsen's is fine. We just need some way to enable transactional DDL.
|msg216738 - (view)||Author: (aaugustin) *||Date: 2014-04-17 20:43|
That patch solves the problem, at the cost of introducing an unwieldy API, "operation_needs_transaction_callback". I'm very skeptical of the other API, "in_transaction". Other database backends usually provide an "autocommit" attribute. "autocommit" is the opposite of "in_transaction" for all practical purposes. There's only two situations where they may be equal: - before the first query - after an explicit commit Then you aren't in a transaction and you aren't in autocommit. But in these cases, in practice, the question you want to ask is "is the next query going to create a transaction?" (and if not, I may want to create one.) So the semantic of "connection.autocommit" is much more useful than the semantic of "connection.in_transaction". While you're there, it would be cool to provide "connection.autocommit = True" as an API to enable autocommit, because "connection.isolation_level = None" isn't a good API at all -- it's very obscure and has nothing to do with isolation level whatsoever.
|msg216740 - (view)||Author: Antoine Pitrou (pitrou) *||Date: 2014-04-17 20:58|
What is the callback thing for? The only value that's ever passed in the tests is `lambda operation: True`.
messages: + msg216740
|2014-04-17 20:43:46||aaugustin||set||messages: + msg216738|
|2014-04-17 20:01:12||monsanto||set||messages: + msg216734|
messages: + msg216732
messages: + msg216715
|2014-02-01 22:47:07||Ronny.Pfannschmidt||set||messages: + msg209924|
|2014-02-01 21:11:03||r.david.murray||set||messages: + msg209917|
|2014-02-01 18:20:30||Ronny.Pfannschmidt||set||messages: + msg209904|
|2014-02-01 17:25:13||r.david.murray||set||messages: + msg209901|
messages: + msg209900
|2014-01-09 22:05:44||r.david.murray||set||type: behavior -> enhancement|
messages: + msg207800
versions: + Python 3.5, - Python 3.1, Python 2.7, Python 3.2, Python 3.3
|2014-01-09 21:20:28||zzzeek||set||messages: + msg207796|
+ Python 3.1, Python 2.7, Python 3.2|
nosy: + adamtj
title: sqlite3 module should allow DDL statements in transactions -> sqlite3 module breaks transactions and potentially corrupts data
messages: + msg207789
type: enhancement -> behavior
|2012-10-18 18:56:16||Jeremy Banks||set||nosy:
+ Jeremy Banks|
messages: + msg144197
|2011-05-20 10:28:22||torsten||set||messages: + msg136362|
messages: + msg132612
|2011-03-29 07:39:29||torsten||set||messages: + msg132470|
|2011-03-28 19:49:14||dholth||set||messages: + msg132414|
|2011-03-27 01:56:07||torsten||set||files: + sqlite_transaction_config_py27.diff|
messages: + msg132287
nosy: + torsten
messages: + msg132285
messages: + msg128815
|2010-12-20 22:08:46||scott.urban||set||messages: + msg124407|
+ Python 3.3, - Python 2.6|
nosy: + ghaering, r.david.murray
messages: + msg124398
type: behavior -> enhancement
messages: + msg124393