classification
Title: __setitem__()'s problem of dbm.dumb object pointed out by comments
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: r.david.murray, ysj.ray
Priority: normal Keywords: patch

Created on 2011-02-28 09:53 by ysj.ray, last changed 2011-03-12 02:03 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
builtin_abort.diff ysj.ray, 2011-03-02 04:28 review
Messages (4)
msg129688 - (view) Author: ysj.ray (ysj.ray) Date: 2011-02-28 09:53
By reading the Lib/dbm/dumb.py source, there seems to be an distinct problem which is pointed out in comments: the __setitem__() should call self._commit() in order to keep .dat file and .dir file consist.

Here is a piece of comment found at the end of __setitem__():

# Note that _index may be out of synch with the directory
# file now:  _setval() and _addval() don't update the directory
# file.  This also means that the on-disk directory and data
# files are in a mutually inconsistent state, and they'll
# remain that way until _commit() is called.  Note that this
# is a disaster (for the database) if the program crashes
# (so that _commit() never gets called).


And another piece found in __delitem__():

# XXX It's unclear why we do a _commit() here (the code always
# XXX has, so I'm not changing it).  __setitem__ doesn't try to
# XXX keep the directory file in synch.  Why should we?  Or
# XXX why shouldn't __setitem__?


One probable reason I guess is that the self._commit() method which writes the keys information to .dir file is regarded as a slow process, and the __setitem__() is called frequently at most cases while __deltiem__ is not, so calling self._commit() at each __setitem__ could make the program very slow.

But based on the fact that "The dbm.dumb module is not written for speed and is not nearly as heavily used as the other database modules."(found on dbm's library document), maybe correctness is more important than speed.

So I think it should be fixed:

Index: Lib/dbm/dumb.py
===================================================================
--- Lib/dbm/dumb.py	(revision 88674)
+++ Lib/dbm/dumb.py	(working copy)
@@ -196,6 +196,7 @@
             # remain that way until _commit() is called.  Note that this
             # is a disaster (for the database) if the program crashes
             # (so that _commit() never gets called).
+            self._commit()
 
     def __delitem__(self, key):
         if isinstance(key, str):

And the remaining comments can be deleted.
msg129696 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-28 13:11
Can you construct a test case that demonstrates the corruption?
msg129856 - (view) Author: ysj.ray (ysj.ray) Date: 2011-03-02 04:28
Here is a test case.

First here is a patch which implements a simple builtin function "abort()" that calls exit(0) directly, it simulates the cases that Py_FatalError occurred or segment fault.

Then run the following:

import dbm.dumb as dumb

db = dumb.open('test_db', 'c')
db.clear()
db['a'] = 'a'
db.sync()
db['a'] = 'aa'
abort()


Now the database 'test_db' is corrupt because .dat file and .dir file are out of sync:

db = dumb.open('test_db', 'c')
print(db['a'])
db.close()

prints:
b'a'

But the value of key 'a' in .dat file are: 'aa':
cat test_db.dat
aa
msg130650 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-12 02:03
Well, I don't think we are going to add an abort method in order to write a unit test, though it isn't completely out of the question.

However, I've now looked at the code and the other comments in the file, and it is clear that great care is taken to allow the database to be out of sync in order to avoid calling _commit all the time.  So I don't think this is a desired change.

Thanks for working out the test, though, since it makes it clear that it is only a catastrophic failure of the interpreter that will trigger this bug (or, of course, a kill -KILL), and I think we are just accepting that danger.

If you disagree, it should be brought up on python-dev.
History
Date User Action Args
2011-03-12 02:03:29r.david.murraysetstatus: open -> closed
nosy: r.david.murray, ysj.ray
messages: + msg130650

resolution: wont fix
stage: test needed -> resolved
2011-03-02 04:28:52ysj.raysetfiles: + builtin_abort.diff

messages: + msg129856
keywords: + patch
nosy: r.david.murray, ysj.ray
2011-02-28 13:11:44r.david.murraysetnosy: + r.david.murray

messages: + msg129696
stage: test needed
2011-02-28 09:53:05ysj.raycreate